TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems. Exercise 2: DLX I - Architecture
|
|
- Carmella Bailey
- 5 years ago
- Views:
Transcription
1 TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems TKT-3206 Computer Architecture I Exercise 2: DLX I - Architecture Group Name stud. num.
2 General info about the exercise The purpose of this exercise is to explain how a pipelined processor works, and which factors affect its performance. A simulator of a pipelined DLX processor is used in this exercise work. The simulator has been done in Lund University, Sweden. The returning of this exercise is this document completed. Give clear and brief, yet sufficiently accurate answers to all the questions. Write your answers with readable handwriting, using a pencil so that possible errors can be corrected. Unclear, unintelligible, and nondescriptive answers yield a boomerang or possibly re-doing the whole exercise from the beginning. DLX-simulator In this exercise, a data path model of a DLX processor is studied. A comprehensive model that effectively solves hazards is developed in three steps. The first model, Datapath Model 1 (DP-1), lacks the support for bypassing technique. The second model, Datapath model 2 (DP-2), supports the bypassing technique, but uses the ALU for calculating the jump addresses. The third model, DLX, features a separate adder for the jump address calculation. The simulator supports only integer instructions. The DLX CONTROL exercise will study the implementation of control logic for the DLX processor model. The simulator supports symbolic machine language written for the DLX processor. An assembler code for the simulator can be written using a regular text editor. The extension of the code file must be '.s', and the last instruction of the prorgam trap 0. Installation Go to your home directory in Birdland. Copy the simulator and the program code files to yourself (first make sure that you have at least 1.1 MB of quota left) using the command: cp -r ~rhu/public_html/tktekn/dlx/. Now you have all the files needed in doing this and DLX-CONTROL exercise in the subdirectory DLX. Go to this sub directory (cd DLX). Starting the simulator The simulator is started with the command: sparc_pipe (or./sparc_pipe is the current path is not defined in the search path of your environment settings) 1/18
3 and the following window opens: The window shows the five stages of the processor, which are (from lef): IF (Instruction Fetch), ID (Instruction Decode), EX (Execute), MEM (Memory Access) and WB (Write Back to register file). The processor is pipelined by inserting D-flip-flops (registers) in between the stages. The intermediate results of each stage are saved into these registers, which are symbolized in the simulator by the boxes on top of the dash lines, on each clock cycle. The functional units that perform the calculation reside between the dash lines. The calculation proceeds from left to right, with the exception of the writebacks. The bold lines represent 32-bit buses which connect the functional units and registers. The thin lines imply one-bit control signals. A diagram that uses the above-described grouping of clocksynchronized memory elements on vertical lines is called a Werner-diagram. The instruction executed on each stage is shown in the box below the stage. The contents of the register file can be examined by clicking the left mouse button ontop of the Register File box in the simulator. The contents of the register file cannot be altered manually. The functions of the simulator On the top-lefthand corner of the window are two pull-down menus: File and Views. An assembler code can be loaded by using the command Load of the File menu, and the simulator exited using the Quit command. The used data path model (DP_1, DP_2 or DLX) is chosen from the Views menu. Each data path model contains three options that are used in examining hazards: Pipelining/No Pipelining, Delayed Load/No Delayed Load and Delayed Branch/No Delayed Brach. These options can be changed by clicking the boxes on the top of the simulator window. Before executing a program, the Program Counter (PC) and the Register File must be resetted. This is done by clicking the Reset box. Note that Reset does not reset the data memory; it can be resetted e.g. by (re)loading the program code. The program can be executed one clock cycle at a time by clicking the box Clock. Clicking Run will execute the wohle program. Elapsed Cycles shows the amount of clock cycles passed from the moment when the first instruction enters the IF stage, or if Run is clicked, the total number of clock cycles from the start until the last instruction reaches WB stage. 2/18
4 Clock cycles Per Instruction (CPI) is calculated with the following formula: CPI NumberOfClockCycles = = NumberOfInstructions( nonop) NumberOfClockCycles NumberOfClockCycles Hazards FillingThePipeline Hazards is the number of clock cycles that is needed in solving the hazards. This sum composes of the NOP-instructions reaching the WB stage, which appear after the first instruction reaches WB stage, and of the clock cycles when the pipeline is stalled due to hazards. Sub-operations of the DLX instructions Examine the following program: LW ADD SW R3,18(R0) R1,R3,R3 18(R0),R1 What does M[18+R0] contain after the program execution, if M[18+R0] is before the execution? Next, we will study what sub-operations an un-pipelined data path performs on each of its stages to the instructions of the above code. By running the program without pipelining, we can concentrate on the execution of a single instruction at a time and examine what is happening at the different pipeline stages. The program code exempel1.s is loaded into the simulator by selecting File->Load. Type the name of the code file and click OK. Make sure that the data path settings are: Datapath Model 1, No Pipelining, No Delayed Load and No Delayed Branch. Give a clock pulse by clicking the box Clock. What instruction is read into the processor? What does that particular instruction do? Give a clock pulse. Enter data into table 1 (the abbreviations of the sub-operations are listed in table 2). What is being done in the ID stage? Give a clock pulse. Where does the ALU get its operands? What is the result of the ALU-operation? What was calculated in the ALU? Give a clock pulse. What value does DMAR get? What is the DMAR-bus connected into? What sub-operetion was performed in the EX stage? 3/18
5 What does the register R3 contain (click the Register-File)? Give a clock pulse. What does the register R3 now contain and why? The instruction LW R3,18(R0) has now been executed. Mark the performed sub-operations into table 1 using the abbreviations listed in table 2. On the last row, mark which functional unit in the Werner diagram performs the operation in each stage. Table 1: The sub-operations of the example code. Class IF ID EX MEM WB LOAD instruction ALU instruction STORE instruction BRANCH instruction Functional unit Table2: Sub-operation abbreviations for table 1. Sub-operation Abbreviation Instruction Fetch Register Read Arithmetic/logical Operation Operand address Calculation Jump address Calculation Memory Read Memory Write Register Write Load Program Counter no sub-operation - Give a clock pulse. The instruction ADD R1,R3,R3 is fetched now. Answer the following questions by giving clock pulses. What is done in the EX stage? Which sub-operation was performed in the MEM stage? IF RR AO OC JC MR MW RW LPC 4/18
6 5/18 Examine the contents of registers R1 and R3 before you forward the instruction into the WB stage. What do R1 and R3 contain? R1 = R3 = Give a clock pulse. What does the register R1 now contain and why? R1 = Fill into table 1 the sub-operations that were now performed. The next instruction is SW 18(R0),R1. Execute the instruction by giving enough clock pulses and answer to the following questions. What does the instruction do? Which registers were read in the ID stage? What was done in the EX stage? Which sub-operation was performed in the MEM stage? A special bus was used in transferring the contents of register R1. Why? Which sub-operation is performed in the WB stage? Fill in the the sub-operations of the instruction into table 1. When resolving the stage in which a given sub-operation is performed, it does not matter where the corresponding functional unit resides. The stage in which a sub-operation is performed is defined based on the location of the instruction at the moment of the sub-operation execution. For example, the register write is performed in the WB stage even though the register file is in the ID stage. This is an important property when the sub-operations of a BRANCH instruction are examined. Finally, we study the sub-operations of the instruction BEQZ R0,14. What is done in the ID stage? What is done in the EX stage? What is the output of the ALU?
7 What controls the multiplexer MX1? In which situations the control signal of MX1 changes its state? The value of the PC changes as you move the instruction into the WB stage. What is the new PC value? Why does the PC value change into this particular number? Now fill in the missing fields of table 1. We can see from table 1 that for some of the instructions, a sub-operation is performed in each stage, while some of the instructions require a no sub-operation in some of their stages. For which instruction classes and in which stages these no sub-operations are performed? So far, each instruction has been executed separately. An important figure with regard of the effectiveness of a code is how many clock cycles on average are needed in executing one instruction. This figure is called Clock cycles Per Instruction (CPI). What is the CPI of the executed program and why? Parallelism and pipelining In order for the pipelining to be possible, every instruction must be able to be divided into suboperations, and these sub-operations to be performed in order. However, the instructions do not have to utilize every sub-operation. Furthermore, the processor must have enough functional units and buses so that each pipeline stage can execute a sub-operation on every clock cycle. If this condition is not met, the pipeline must be stalled until busy functional unit or bus becomes available again. This situation is referred to as a structural hazard. Mark into table 1 which pipeline stages use the following functional units: data memory, instruction memory, ALU, and register file. Why two pipeline stages can simultaneously use the register file without causing a structural hazard? What are these stages? What architectural properties must the memory have in order to avoid a structural hazard? 6/18
8 7/18 From now on, we assume that the memory has been constructed so that there are no structural hazards. Examine the following program: ADDI R1,R0,#1;R1 <- 1 ADDI R2,R0,#2;R2 <- 2 ADDI R3,R0,#4;R3 <- 4 ADD R1,R1,R1;R1 <- R1+R1 ADD R2,R2,R2;R2 <- R2+R2 ADD R3,R3,R3;R3 <- R3+R3 ADD R1,R1,R1;R1 <- R1+R1 ADD R2,R2,R2;R2 <- R2+R2 ADD R3,R3,R3;R3 <- R3+R3 Determine by examining the code, what do the registers R1, R2, R3 contain after the execution? Give the answer in hexadecimals. R1 = R2 = R3 = The program is in file exempel2.s. Load it into the simulator and run it (Run). How many clock cycles does the execution take, when only one instruction is performed at a time? (DP_1, No pipelining) Now, enable the pipelining (Pipelining), and the processor resets. Give clock pulses until the firstthe instruction reaches MEM stage. How many clock cycles does it take? After this, on every clock cycle, one instruction is finished. We can say that now the pipeline is full. Give a clock pulse. Which register is updated and which registers are read? How many instructions are performed simultaneously in the pipeline? Give clock pulses, until the last instruction is in the WB stage. What are the contents of registers R1,R2 and R3? R1 = R2 = R3 = How many clock cycles did the execution take? If the clock cycles needed in filling the pipeline are not counted, how much faster is the execution when pipelining is used? Mark in the following space also the formula you used to calculate your result with. From this we can conclude that pipelining significantly increases the speed of the calculation if there are no structural hazards. In practice, however, there will be some problems that reduce this speed-up. Next, we will study what exactly these problems are.
9 Data hazards Next, we will determine what kind of special mechanisms are required to quarantee correct program execution in spite of data dependencies between the instructions. Bypassing Examine the following program: ADDI R1,R0,#2;R1 <- 2 ADD R2,R1,R1;R2 <- R1+R1 ADD R3,R1,R1;R3 <- R1+R1 ADD R4,R1,R1;R4 <- R1+R1 What should the registers R1, R2, R3 and R4 contain after the program execution? R1 = R2 = R3 = R4 = The program is in file exempel3.s. Load and run the program. (DP_1, Pipelining) What do the registers R1, R2, R3 and R4 contain after the execution? R1 = R2 = R3 = R4 = Apparently, something went wrong. To understand what it was, run the program again. Reset the processor and give clock pulses until instruction ADD R2,R1,R1 reaches ID stage. What is the value of register R1? R1 = Where in the data path is the correct value of register R1? Give clock pulses until instruction ADD R3,R1,R1 reaches ID stage. What is the value of register R1 when it is being read? R1 = Where in the data path is the correct value of register R1? Give clock pulses until instruction ADD R4,R1,R1 reaches ID stage. What is the value of register R1? Is it correct? R1 = Now you have (hopefully) understood that the register file does not always contain the correct values. Thus, it must be possible to bypass the result of an instruction to the next two instructions if they use this result as their operand. 8/18
10 From which point(s) of the pipeline must this bypassing be possible? The next data path model, Datapath Model 2 (DP_2) supports the Bypassing technique. Choose data path model DP_2 from the View menu (and the processor is resetted). Give clock pulses until instruction ADD R2,R1,R1 reaches ID stage. Examine the state of multiplexers MX2 and MX3. From where comes the value of R1? Give clock pulses until instruction ADD R3,R1,R1 reaches ID stage. Examine the state of multiplexers MX2 and MX3 again. From where comes the value of R1 now? Give clock pulses until the program has been completely executed and make sure that the registers have the correct values. Delayed Load Examine the following program: LW ADD ADD ADD R1,18(R0);R1 <- M[18] R2,R1,R1;R2 <- R1+R1 R3,R1,R1;R3 <- R1+R1 R4,R1,R1;R4 <- R1+R1 What should the registers R1, R2, R3 and R4 contain after the program execution if M[18] = before the execution? R1 = R2 = R3 = R4 = The program is in file exempel4.s. Load and run the program (DP_2, Pipelining). What do the registers R1, R2, R3 and R4 contain after the execution? R1 = R2 = R3 = R4 = Apparently, something went wrong. To understand what it was, run the program again. Reset the processor and give clock pulses until instruction ADD R2,R1,R1 reaches ID stage. In which stage is instruction LW R1,18(R0)? Give a clock pulse. Where in the data path is the correct value of register R1? In which stage is instruction ADD R2,R1,R1? 9/18
11 Why instruction ADD R3,R1,R1 gets the correct register value? The execution of one instruction failed. Which instruction was it, and why exactly this particular instruction failed? This problem can be solved by using Delayed Load. Always when the result of a LOAD instruction is used by the next instruction, the pipeline is stalled until the Bypassing technique can be used to forward the loaded value to the next instruction(s). Activate Delayed Loadby clicking the filed No Delayed Load. (DP_2, Pipelining, Delayed Load, Delayed Branch) Give clock pulses until instruction ADD R2,R1,R1 reaches ID stage. In which stage is instruction LW R1,18(R0)? Give a clock pulse. In which stage is instruction ADD R2,R1,R1 now? Pipeline gets stalled (STALL) so that the instructions in ID and IF stages no longer progress within the pipeline. From where is the value of register R1 now obtained? Finish the execution with the Run command and make sure that the values of the registers are now correct. How many clock cycles did the execution take? How many clock cycles would the execution take if there was no data dependency between the first LOAD instruction and the first ADD instruction? Delayed Load and Bypassing together quarantee that all data hazards are eliminated. However, one clock cycle is lost every time when there is a data dependency between a LOAD instruction and the instruction following it. 10/18
12 Control hazards Examine the following program (the memory addresses of the instructions are on the left) 0000 ADDI R1,R0,#9;R1 < ADD R2,R0,R0;R2 < ADD R3,R0,R0;R3 < C ADD R4,R0,R0;R4 < ADD R5,R0,R0;R5 < LOOP: ADDI R2,R2,#1;R2 <- R SUBI R1,R1,#1;R1 <- R C BNEZ R1,LOOP;If R1 <> 0 then BRANCH to LOOP 0020 ADD R3,R3,R2;R3 <- R3+R ADD R4,R4,R2;R4 <- R4+R ADD R5,R5,R2;R5 <- R5+R2 What should the registers R1, R2, R3, R4 and R5 contain after the program execution? R1 = R2 = R3 = R4 = R5 = What should the registers R3, R4 and R5 contain just before instruction ADD R3,R3,R2 is executed? R3 = R4 = R5 = How many instructions are executed? The program is in file exempel5.s. Load and run the program. (DP_2, Pipelining, Delayed Load, Delayed Branch) What do the registers R1, R2, R3, R4 and R5 contain after the execution? R1 = R2 = R3 = R4 = R5 = Reset the processor and give clock pulses until instruction BNEZ R1,LOOP reaches ID stage the first time. Give a clock pulse. Which instruction is fetched? Give a clock pulse. Which instruction is now fetched? What is the value of the program counter? PC = Give a clock pulse. What is the value of the program counter and which instruction is now fetched? PC = 11/18
13 Give a clock pulse. What is the value of register R3 and why it is wrong? R3 = After the branch instruction has reached ID stage, how many clock cycles does it take until the address determined by the branch instruction is loaded into the program counter? The processor does not function correctly. The problem is that the branch address is not loaded into the program counter immediately, but some instruction following the branch instruction are executed regardless whether the branch condition evaluates true or false. The simple solution is to stop the pipeline whenever a branch instruction is about to proceed forward from ID stage. The execution continues only after the jump address has been calculated and the jump condition evaluated, or, if the branch is taken, after the program counter has been updated to point to the new instruction. This is achieved by using the option No Delayed Branch, which we will study next. Change the processor settings to: DP_2, Pipelining, Delayed Load, No Delayed Branch. Give clock pulses, until instruction BNEZ R1,LOOP reaches ID stage the first time. Give a clock pulse. Which instruction is fetched? The fetching of instructions has apparently been halted. After how many clock cycles the fetching of new instructions is resumed? Finish the execution of the program and make sure that the values of the registers are correct. How many clock cycles does the execution take? Why the number of used clock cycles differs greatly from the number of executed instructions? Now the processor works correctly with branch instructions, but the performance is greatly degraded due to the large branch overhead. In practice, processors spend a large portion of the total execution time within program loops, and since the loops are predominantly short, a big loop-overhead is simply not acceptable. Advancing the jump address calculation The problem with datapath model 2 is that as the jump address and branch condition are calculated in EX stage and the results are available during MEM stage. If the calculation could be advanced, the loop overhead would be smaller. 12/18
14 13/18 The earliest stage in which the processor recognizes an instruction to be a branch instruction is ID stage. Thus, the jump address calculation and jump condition evaluation can be done already in ID stage, as seen in the datapath model DLX. Choose datapath model DLX (DLX, Pipelining, Delayed Load, No Delayed Branch). How does the DLX model differ from the DP_2 model? Load the same program as earlier (exempel5.s). Give clock pulses until instruction BNEZ R1,LOOP reaches ID stage the first time. Give a clock pulse. Which instruction is now fetched and why? Finish the program execution. How many clock cycles did the execution take? How much faster was the execution compared to DP-2? What is the CPI value? Now the CPI value is close to one, but not yet quite one. What is the reason for this? Next, we will examine how the programmer and compiler can further reduce the CPI figure. Co-operation between the compiler and pipeline Examine the following program: ADDI R2,R0,#200 ;R2 <- 200 LOOP: LW R3,28(R2) ;R3 <- M[28+R2] ADD R1,R3,R1 ;R1 <- R3+R1 SUBI R2,R2,#4 ;R2 <- R2-4 BNEZ R2,LOOP ;if R2<>0 then BRANCH to LOOP The program is in file exempel6.s. Load and run the program.
15 What is the CPI value? As the number of instructions in a program increases, the effect of the clock cycles needed in filling the pipeline reduces to be insignificant. Thus, the CPI value can be approximated by dividing the number of clock cycles it takes to execute one loop iteration by the number of instructions in the loop. Calculate an approximation for the CPI figure. In addition to the result, indicate also the formula you used to achieve the result. Static instruction scheduling Delayed load What does the delayed load technique imply? Consider the issue from the viewpoints of the programmer/compiler and the hardware. Examine the following program: ADDI R2,R0,#200 ;R2 <- 200 LOOP: LW R3,28(R2) ;R3 <- M[28+R2] SUBI R2,R2,#4 ;R2 <- R2-4 ADD R1,R3,R1 ;R1 <- R3+R1 BNEZ R2,LOOP ;if R2<>0 then BRANCH to LOOP How does this program differ from the previous one (exempel6.s)? The program is in file exempel7.s. Load and run the program. (DLX, Pipelining, Delayed Load, No Delayed Branch). What is the CPI value? 14/18
16 Compare the CPI figure and the portion of the branch instructions to the total amount of instructions. Why is the CPI value now better than with the previous program? How much faster was the execution compared to the previous code (exempel6.s)? Delayed Branch What does the delayed branch technique imply? Consider the issue from the viewpoints of the programmer/compiler and the hardware. Examine the following program: ADDI R2,R0,#200 ;R2 <- 200 LOOP: LW R3,28(R2) ;R3 <- M[28+R2] SUBI R2,R2,#4 ;R2 <- R2-4 BNEZ R2,LOOP ;if R2<>0 then BRANCH to LOOP ADD R1,R3,R1 ;R1 <- R3+R1 How does this program differ from the previous one (exempel7.s)? Activate the option Delayed Branch. The program is in file exempel8.s. Load and run the program. (DLX, Pipelining, Delayed Load, Delayed Branch). What is the CPI value now? 15/18
17 16/18 The compiler cannot always place a useful instruction next to a branch instruction. In these cases, an instruction that does not affect the calculation must be used. Such instruction is NOP (No Operation). An example of these situations is shown in the following program: ADDI R2,R0,#200 ;R2 <- 200 ADD R1,R0,R0 ;R1 <- 0 LOOP: ADD R1,R1,R2 ;R1 <- R1+R2 SUBI R2,R2,#1 ;R2 <- R2-1 BNEZ R2,LOOP ;if R2<>0 then BRANCH to LOOP Here, the compiler cannot schedule any of the instructions to be after the branch instruction. Thus, a NOP instruction must be added. ADDI R2,R0,#200 ;R2 <- 200 ADD R1,R0,R0 ;R1 <- 0 LOOP: ADD R1,R1,R2 ;R1 <- R1+R2 SUBI R2,R2,#1 ;R2 <- R2-1 BNEZ R2,LOOP ;if R2<>0 then BRANCH to LOOP NOP Calculate the CPI figure for the above rogram. In addition to the result, indicate also the formula you used to achieve the result. The program is in file exempel9.s. Load and run the program. (DLX, Pipelining, Delayed Load, Delayed Branch). Is the resulting CPI figure the same as what you calculated? The NOP instruction can be eliminated by using more sophisticated methods. The following program performs the same function as the previous code, but without the NOP instruction. ADDI R2,R0,#200 ;R2 <- 200 ADD R1,R0,R0 ;R1 <- 0 ADD R1,R1,R2 ;R1 <- R1+R2 LOOP: SUBI R2,R2,#1 ;R2 <- R2-1 BNEZ R2,LOOP ;if R2<>0 then BRANCH to LOOP ADD R1,R1,R2 ;R1 <- R1+R2 SUB R1,R1,R2 ;R1 <- R1-R2 Two instruction had to be added into the for the program to yield correct results. Mark these instructions into the above code. The CPI figure is now reduced to one. The penalty of this modification is the increase of the number of instructions by two. Nevertheless, as loops are usually executed more than two times, the benefits far outweigh the drawbacks. However, not all compilers are able to do this kind of sophisticated analysis.
18 The program is in file exempel10.s. Load and run the program. (DLX, Pipelining, Delayed Load, Delayed Branch). What is the CPI figure now? Compare the CPI value and the execution time to the previous program (exempel9.s). Examine the following program: ADDI R1,R0,#16 ;R1 <- 16 ADDUI R2,R0,#4777 ;R2 < ADDUI R3,R0,#1326 ;R3 < ADDI R5,R0,#0 ;R5 <- 0 ADDUI R6,R0,#32768 ;R6 < = FRANK: AND R4,R2,R6 ;R4 <- R2 AND R6 SLLI R2,R2,#1 ;R2 <- R2<<1 SLLI R5,R5,#1 ;R5 <- R5<<1 SUBI R1,R1,#1 ;R1 <- R1-1 BNEZ R4,ZED ;if R4<>0 then BRANCH to ZED BNEZ R1,FRANK ;if R1<>0 then BRANCH to FRANK J END ;BRANCH to END ZED: ADD R5,R5,R3 ;R5 <- R5+R3 BNEZ R1,FRANK ;if R1<>0 then BRANCH to FRANK END: TRAP 0 ;end program What does this program do? If your answer does not easily fit on the following line, it is wrong. The execution time of the program depends on the initial value of R2 (4777 in the example code). With what R2 value the execution time is maximized and minimized? Max: R2 = 16 = 10 Min: R2 = 16 = 10 The program is in file zorbas1.s. Load and run the program. (DLX, Pipelining, Delayed Load, No Delayed Branch). Write the results given by the simulator on the next line. Elapsed Cycles = CPI = Hazards = R5 = 16 Modify the program so that all control hazards are eliminated. The execution time and CPI figure must also be improved. Use Delayed Branch. You can re-order, add, and remove instructions. NOP instruction or its look-alikes, such as writing into a register that is never read, must not be used. Correct functioning of the program can be tested by comparing the final value of R5 to the value producad by the example code (zorbas1.s). The final values of the other registers are largely irrelevant. 17/18
19 Write the simulation results of the modified code on the next line. Elapsed Cycles = CPI = Hazards = R5 = 16 Write the code on the next lines. The instructions preceding the loops do not have to be written, unless they have been modified. 18/18
EXERCISE 3: DLX II - Control
TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems TKT-3200 Computer Architectures I EXERCISE 3: DLX II - Control.. 2007 Group Name Email Student nr. DLX-CONTROL The meaning of
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationEXAM #1. CS 2410 Graduate Computer Architecture. Spring 2016, MW 11:00 AM 12:15 PM
EXAM #1 CS 2410 Graduate Computer Architecture Spring 2016, MW 11:00 AM 12:15 PM Directions: This exam is closed book. Put all materials under your desk, including cell phones, smart phones, smart watches,
More informationCS 251, Winter 2019, Assignment % of course mark
CS 251, Winter 2019, Assignment 5.1.1 3% of course mark Due Wednesday, March 27th, 5:30PM Lates accepted until 1:00pm March 28th with a 15% penalty 1. (10 points) The code sequence below executes on a
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationCS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.
More informationPage 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson
More informationLecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S
Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching
More informationPage # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationMidnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4
IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM
More informationCSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;
CSE 30321 Lecture 13/14 In Class Handout For the sequence of instructions shown below, show how they would progress through the pipeline. For all of these problems: - Stalls are indicated by placing the
More informationECEC 355: Pipelining
ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly
More informationInstruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31
4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor
More informationUpdated Exercises by Diana Franklin
C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationc. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?
Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined
More informationCS 251, Winter 2018, Assignment % of course mark
CS 251, Winter 2018, Assignment 5.0.4 3% of course mark Due Wednesday, March 21st, 4:30PM Lates accepted until 10:00am March 22nd with a 15% penalty 1. (10 points) The code sequence below executes on a
More informationPipeline Architecture RISC
Pipeline Architecture RISC Independent tasks with independent hardware serial No repetitions during the process pipelined Pipelined vs Serial Processing Instruction Machine Cycle Every instruction must
More informationFour Steps of Speculative Tomasulo cycle 0
HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationECE473 Computer Architecture and Organization. Pipeline: Control Hazard
Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction
More informationPipelining. Maurizio Palesi
* Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationInstruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4
PROBLEM 1: An application running on a 1GHz pipelined processor has the following instruction mix: Instruction Frequency CPI Load-store 55% 5 Arithmetic 30% 4 Branch 15% 4 a) Determine the overall CPI
More informationLecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationCENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.
Exam 2 April 12, 2012 You have 80 minutes to complete the exam. Please write your answers clearly and legibly on this exam paper. GRADE: Name. Class ID. 1. (22 pts) Circle the selected answer for T/F and
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationInstr. execution impl. view
Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationBasic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?
Basic Instruction Timings Pipelining 1 Making some assumptions regarding the operation times for some of the basic hardware units in our datapath, we have the following timings: Instruction class Instruction
More informationLecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.SP96 1 Review: Evaluating Branch Alternatives Two part solution: Determine
More information6.004 Tutorial Problems L22 Branch Prediction
6.004 Tutorial Problems L22 Branch Prediction Branch target buffer (BTB): Direct-mapped cache (can also be set-associative) that stores the target address of jumps and taken branches. The BTB is searched
More informationPlease state clearly any assumptions you make in solving the following problems.
Computer Architecture Homework 3 2012-2013 Please state clearly any assumptions you make in solving the following problems. 1 Processors Write a short report on at least five processors from at least three
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationCS3350B Computer Architecture Quiz 3 March 15, 2018
CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationEC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution
EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution Important guidelines: Always state your assumptions and clearly explain your answers. Please upload your solution document
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationSpeeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land
Speeding Up DLX 1 DLX Execution Stages Version 1 Clock Cycle 1 I 1 enters Instruction Fetch (IF) Clock Cycle2 I 1 moves to Instruction Decode (ID) Instruction Fetch (IF) holds state fixed Clock Cycle3
More informationPipelining is Hazardous!
Pipelining is Hazardous! Hazards are situations where pipelining does not work as elegantly as we would like Three kinds Structural hazards -- we have run out of a hardware resource. Data hazards -- an
More informationEE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes
NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are
More informationFinal Exam Fall 2007
ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd
More informationENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013
ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of
More informationFloating Point/Multicycle Pipelining in DLX
Floating Point/Multicycle Pipelining in DLX Completion of DLX EX stage floating point arithmetic operations in one or two cycles is impractical since it requires: A much longer CPU clock cycle, and/or
More informationLecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2
Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time
More informationComputer System Architecture Quiz #1 March 8th, 2019
Computer System Architecture 6.823 Quiz #1 March 8th, 2019 Name: This is a closed book, closed notes exam. 80 Minutes 14 Pages (+2 Scratch) Notes: Not all questions are of equal difficulty, so look over
More informationCS146 Computer Architecture. Fall Midterm Exam
CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state
More informationCS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25
CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem
More information6.823 Computer System Architecture Datapath for DLX Problem Set #2
6.823 Computer System Architecture Datapath for DLX Problem Set #2 Spring 2002 Students are allowed to collaborate in groups of up to 3 people. A group hands in only one copy of the solution to a problem
More informationComputer Architectures. DLX ISA: Pipelined Implementation
Computer Architectures L ISA: Pipelined Implementation 1 The Pipelining Principle Pipelining is nowadays the main basic technique deployed to speed-up a CP. The key idea for pipelining is general, and
More informationThe Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.
The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions
More informationVertieferlabor Mikroelektronik (85-324) & Embedded Processor Lab (85-546) Task 5
FB Elektrotechnik und Informationstechnik Prof. Dr.-Ing. Norbert Wehn Dozent: Uwe Wasenmüller Raum 12-213, wa@eit.uni-kl.de Task 5 Introduction Subject of the this task is the extension of the fundamental
More informationLECTURE 10. Pipelining: Advanced ILP
LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction
More informationStatic, multiple-issue (superscaler) pipelines
Static, multiple-issue (superscaler) pipelines Start more than one instruction in the same cycle Instruction Register file EX + MEM + WB PC Instruction Register file EX + MEM + WB 79 A static two-issue
More informationRecall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationAdvanced Pipelining and Instruction- Level Parallelism 4
4 Advanced Pipelining and Instruction- Level Parallelism 4 Who s first? America. Who s second? Sir, there is no second. Dialog between two observers of the sailing race later named The America s Cup and
More informationComplex Pipelines and Branch Prediction
Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle
More informationReview: Evaluating Branch Alternatives. Lecture 3: Introduction to Advanced Pipelining. Review: Evaluating Branch Prediction
Review: Evaluating Branch Alternatives Lecture 3: Introduction to Advanced Pipelining Two part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier Pipeline speedup
More informationOrange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction
More informationThe Evolution of Microprocessors. Per Stenström
The Evolution of Microprocessors Per Stenström Processor (Core) Processor (Core) Processor (Core) L1 Cache L1 Cache L1 Cache L2 Cache Microprocessor Chip Memory Evolution of Microprocessors Multicycle
More information4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?
Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide
More informationLaboratory 05. Single-Cycle MIPS CPU Design smaller: 16-bits version One clock cycle per instruction
Laboratory 05 Single-Cycle MIPS CPU Design smaller: 16-bits version One clock cycle per instruction 1. Objectives Study, design, implement and test Instruction Fetch Unit for the 16-bit Single-Cycle MIPS
More informationSuper Scalar. Kalyan Basu March 21,
Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationLecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)
Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining
More informationInstruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction
Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST *
ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST * SAMPLE 1 Section: Simple pipeline for integer operations For all following questions we assume that: a) Pipeline contains 5 stages: IF, ID, EX,
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationIntroduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Introduction to Pipelining Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L15-1 Performance Measures Two metrics of interest when designing a system: 1. Latency: The delay
More informationComputer Architecture Practical 1 Pipelining
Computer Architecture Issued: Monday 28 January 2008 Due: Friday 15 February 2008 at 4.30pm (at the ITO) This is the first of two practicals for the Computer Architecture module of CS3. Together the practicals
More informationCS3350B Computer Architecture Winter 2015
CS3350B Computer Architecture Winter 2015 Lecture 5.5: Single-Cycle CPU Datapath Design Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson
More informationHakim Weatherspoon CS 3410 Computer Science Cornell University
Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. memory inst register
More informationECE 486/586. Computer Architecture. Lecture # 12
ECE 486/586 Computer Architecture Lecture # 12 Spring 2015 Portland State University Lecture Topics Pipelining Control Hazards Delayed branch Branch stall impact Implementing the pipeline Detecting hazards
More informationComputer Architecture EE 4720 Practice Final Examination
Name Computer Architecture EE 4720 Practice Final Examination 10 May 1997, 02:00 04:00 CDT :-) Alias Problem 1 Problem 2 Problem 3 Problem 4 Exam Total (100 pts) Good Luck! Problem 1: Systems A and B are
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationChapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations
Chapter 4 The Processor Part I Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations
More informationHY425 Lecture 05: Branch Prediction
HY425 Lecture 05: Branch Prediction Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS October 19, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 05: Branch Prediction 1 / 45 Exploiting ILP in hardware
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2018 Static Instruction Scheduling 1 Techniques to reduce stalls CPI = Ideal CPI + Structural stalls per instruction + RAW stalls per instruction + WAR stalls per
More informationComputer System Architecture Final Examination Spring 2002
Computer System Architecture 6.823 Final Examination Spring 2002 Name: This is an open book, open notes exam. 180 Minutes 22 Pages Notes: Not all questions are of equal difficulty, so look over the entire
More informationAppendix C: Pipelining: Basic and Intermediate Concepts
Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationDesign for a simplified DLX (SDLX) processor Rajat Moona
Design for a simplified DLX (SDLX) processor Rajat Moona moona@iitk.ac.in In this handout we shall see the design of a simplified DLX (SDLX) processor. We shall assume that the readers are familiar with
More informationCS252 Graduate Computer Architecture Midterm 1 Solutions
CS252 Graduate Computer Architecture Midterm 1 Solutions Part A: Branch Prediction (22 Points) Consider a fetch pipeline based on the UltraSparc-III processor (as seen in Lecture 5). In this part, we evaluate
More informationExploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville
Lecture : Exploiting ILP with SW Approaches Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Basic Pipeline Scheduling and Loop
More informationUnresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos
Unresolved data hazards 81 Unresolved data hazards Arithmetic instructions following a load, and reading the register updated by the load: if (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or
More information5008: Computer Architecture HW#2
5008: Computer Architecture HW#2 1. We will now support for register-memory ALU operations to the classic five-stage RISC pipeline. To offset this increase in complexity, all memory addressing will be
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationLecture 4: Advanced Pipelines. Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10) 1 Hazards Structural hazards: different instructions in different stages (or the same stage)
More information