Practice Problems (Con t) The ALU performs operation x and puts the result in the RR The ALU operand Register B is loaded with the contents of Rx

Size: px
Start display at page:

Download "Practice Problems (Con t) The ALU performs operation x and puts the result in the RR The ALU operand Register B is loaded with the contents of Rx"

Transcription

1 Microprogram Control Practice Problems (Con t) The following microinstructions are supported by each CW in the CS: RR ALU opx RA Rx RB Rx RB IR(adr) Rx RR Rx MDR MDR RR MDR Rx MAR IR(adr) MAR Rx PC IR(adr) The ALU performs operation x and puts the result in the RR The ALU operand Register A is loaded with the contents of Rx The ALU operand Register B is loaded with the contents of Rx RB is loaded with the contents of the address field of the IR. (Supports Immediate Addressing. Rx is loaded from RR (the result of an ALU operation. Rx is loaded from the MDR. (for a LOAD operation) MDR is loaded from the ALU Result Register (Result Store) MDR is loaded from Rx (For a Store Operation) Load MAR from the address field of the IR (Supports Direct Addressing) Load MAR from Rx (Supports Register Indirect Addressing) Load the PC from the address field of the IR (For Branches) Questions: 1. How many bits are in the CSA field? 2. How many bits are in the Branch Control field? 3. If every possible microinstruction has its own bit in each CW, how many bits are there in each CW, not counting the CSA and BC fields? Hint: take into account that there are several GPRs and several ALU operations, including NOP. 4. Since the ALU can only do one operation at a time, the ALU control microinstructions can be encoded into a single field. How many bits will it contain? 5. Besides the ALU oprations, many of the other microinstructions are also mutually exclusive (cannot be active at the same time). A set of such instructions can be encoded into a single field to reduce the total number of bits in each CW. Using this technique, find the minimum number of fields a CW may have (including the CSA and BC fields) and the number of bits in each field. 6. Using the results from questions 4 and 5, and including all fields, how many total bits are there in a CW for this design? NTC 8/22/04 98

2 7. Instruction Prefetch Recall from our earlier discussion that one measure of performance is the number of cycles, on average, it takes for the CPU to execute a single instruction. This is expressed as Cycles per Instruction (CPI). 22 The ideal is to achieve an average of one cycle per instruction. 23 Now consider the typical Fetch-Execute cycle as described earlier. As presented, executing an instruction took five cycles (Fetch, Decode, Operand Fetch, Execute, and Putaway, which we will call F, D, O, E, and P in the following diagrams), so we may presume that such a CPU achieves a CPI of 5. In actuality, the fetch cycle requires a memory access which, if the instruction is in main storage (RAM) will generally be much more than 1 cycle. Similarly, the operand fetch cycle may also require a memory access, perhaps more than one. For the remainder of this discussion we will assume a register-based architecture that finds all operands in GPRs which can be accessed in one cycle. That still leaves the memory access for the instruction itself. Let s call this memory access time T m. Then the number of cycles it takes to execute an instruction is T m + 4 If T m = 10 (and typically memory access times in large mainframes might be much larger than this) then we have a CPI of 14, nowhere near the CPI of 1 we desire. In the figure below we show three such instructions being executed back to back. Notice that we gain some benefit from overlapping the fetch of an instruction with the execution of the previous instruction, since the execution hardware and storage access hardware have no resources in common. F F F Let suppose, however, that our memory organization has been designed so that more than one request can be handled at a time by the memory controls and arrays. Our CPU can only issue one memory request per cycle through the MAR, but there is no reason that after the first memory request is issued we cannot issue a second request 22 Other measures of performance include MIPS (Millions of Instructions per Second) and throughput which can be expressed as number of jobs per unit time. 23 A moment s thought and you will realize that this implies either all instructions take one cycle (unlikely), or that some instructions must take less than one cycle. NTC 8/22/04 99

3 on the cycle immediately following, and a third request on the cycle after that, and so on. Then the first two instructions in the above picture might look as follows: F F The second instruction becomes available to the CPU and ready to be decoded during the Operation Fetch cycle of the first instruction. Since the CPU is still busy with the previous instruction, the second instruction must be saved in a local storage location until it can be processed. This storage location is called the Instruction Prefetch Buffer (IPB) and the process of bring a new instruction into the CPU before it is needed is called instruction prefetching. With the second instruction in the IPB, it is available for processing as soon as the first instruction completes its putaway cycle. Let s do some calculations. Without prefetching, the execution of two instructions took 2x(T m +4) for a CPI of T m +4. With prefetching, it now takes T m +4+4 for a CPI of (T m +8)/2 = T m /2+4. If T m =10 this is a change in CPI from 14 to 9. If we add the third instruction to the picture, we have F [P1] and the CPI = (T m +12)/3 = T m /3+4 = We have shown what is actually the very beginning of an instruction sequence. In the steady state (that is, we have been fetching and executing instructions for a long time), we might see something like this: [P2] This yields a CPI of 4, and represents the best case performance for a four cycle instruction execution sequence. We might generalize, then, that the best achievable CPI us equal to the number of cycles in the Fetch/Execute cycle, assuming all instructions are immediately available from the IPB. Of course, we have ignored a lot of practical difficulties. First, we have assumed (unstated) that there was no limit to the number of instructions we could prefetch. In practice, resource constraints will put an upper bound on the size of the IPB and the number of instructions we can hold in it. Second, we have not taken into consideration that instructions are not always executed in the order that they reside in memory (which NTC 8/22/04 100

4 is the order we are fetching them in). For instance, any one of the instructions might be a branch which requires execution to be continued from some other instruction location which has not been prefetched. In this case the prefetch buffer will need to be flushed (reset, emptied) and a new sequence of instructions will be fetched. We will come back to this later. Third, we have assumed that all instructions take the same number of cycles to execute; this is certainly not the case in reality. So, while it appears that we can approach a CPI equal to the average instruction processing time (counting only the time to Decode, Fetch Operands, Execute, and Putaway), we should not expect to achieve it with prefetching alone. On the other hand, we can do better, but we need more than prefetching. Practice Problems Consider the situation described above (T m = 10 and all instructions take four cycles). 1. What is the minimum capacity of the IPB required to support a 4 CPI steady state performance? 2. At what maximum rate can we Fetch instructions in the steady state, regardless of the size of the IPB? Pipelining We saw, during our discussion of CPU controls, that the Instruction Register is the source of decoded control signals for the entire sequence of instruction processing, and is unavailable to another instruction until the putaway cycle for the current instruction has completed. Let s suppose that we replicate the IR so that there are four IR registers, connected serially to each other as shown in Diagram P3. We will label them IR1 thru IR4. Further assume that the instruction is initially loaded into IR1 but that on the next cycle it is moved from IR1 to IR2, leaving IR1 empty and available. Similarly, on each subsequent cycle the instruction moves from IR2 to IR3, IR3 to IR4, and finally leaves the CPU altogether. While the instruction is in IR1, the operand field is decoded, and the Decode cycle of the instruction execution sequence is completed. While it is in IR2 the operand fields are decoded and the Operand Fetch cycle is performed, while it is in IR3 the actual ALU operation is decoded off the Op Code field and execution takes place, and in IR4 the Putaway cycle occurs. This is diagramed in [P3] below. IR1 Decode Op Code of Instruction NTC 8/22/04 101

5 IR2 Decode Operand Fields and Fetch Operand IR3 Decode Op Code and control ALU IR4 Decode Result Operand Field and Putaway Diagram P3 Each IRx is supplied with its own set of decoders, as appropriate. (Notice, by the way, that we have eliminated (in an expensive fashion) the need for a timing chain or counter to control the timing of control signals.) With this organization, we can see that when an instruction has moved from IR1 to IR2, IR1 is now available for the next instruction. On the next cycle, the first instruction moves to IR3, the second instruction moves to IR2, and we can immediately start a third instruction in IR1. Diagram [P1] can now be redrawn as follows F [P1'] In the steady state this becomes: [P2'] The register organization shown in Diagram P3 is called a pipeline, by analogy with any pipeline (such as an oil pipeline) into which things are sequentially loaded at one end and flow through the pipe until they come out, in the same order, at the other end. In the steady state diagram, [P2'], notice that a putaway occurs on every cycle. It appears that we have finally achieved our goal of a CPI of 1. Of course, the concerns expressed previously still hold, especially the effect on the pipeline of branches in the instruction stream. NTC 8/22/04 102

6 Practice Problem In the steady state Diagram [P2'} 1. At what rate must Instruction fetches be issued to memory? 2. What is the minimum size of IPB required? 3. What restriction, if any, must be placed on the length of T m Branch Prediction As we have seen, there are two types of branches: conditional and unconditional. In the case of an unconditional branch there is no question but that the instructions which follow the branch in memory will not be executed. If they have been prefetched, then the IPB contains instructions that apparently will never be executed; they have to be flushed out of the IPB and a new sequence of instructions must be fetched starting at the target of the branch. 24 In the case of conditional branches, a number of strategies are available. We will consider four of them. 1. Assume the branch is always taken. This strategy arises from the observation that many, if not most, branch instructions appear as the decision point in a loop structure. For instance, consider the following subset of instructions from some instruction sequence: load R, m loop_adr: instruction 1.. instruction n decrement R jnz loop_adr instruction k In this sequence of instructions register R is loaded with some positive integer m. Then instructions 1 through n are executed and the value in R is decreased 24 In some cases the unconditional branch may have as its target instructions that are, in fact, in the IPB, just not the next sequential one. With the addition of appropriate hardware this can be detected and flushing the IPB may be prevented. Of course, we can also prevent flushing a large percentage of the time by making the IPB large enough to hold most loops in there entirety. NTC 8/22/04 103

7 (decremented) by 1. The jnz (Jump if not zero) instruction examines the Zero flag bit (which was set as a result of the decrement instruction). If the contents of R are not zero then instruction execution continues with the instruction at the loop_adr: instruction 1; otherwise instruction k (and all following instructions ) is executed. Since R contains some integer which is usually much larger than 1, in most cases the branch is taken. That is, instructions 1 through n are executed many times and instruction k is only executed once. It makes sense, then, in a pipelined machine, to assume that such conditional branches are, in fact, always taken. This design will make the right decision most of the time and it is only when R goes to zero and we fall through the loop that the IPB and the pipeline will need to be flushed. In the best case, the IPB is large enough so that instructions 1 through n can all reside in it at the same time, eliminating the need for any memory instruction accesses while the loop is in operation. The benefits of this form of branch prediction are small, at best. 2. Predict depending on branch instruction. This is a refinement of the strategy 1 described above. In strategy 1, the branch is assumed to be taken, regardless of the condition being tested. Suppose the example above is modified as follows 25. The changes are bold italics. load R, m loop_adr: instruction 1.. instruction n decrement R jz next_adr jmp loop_adr next_adr: instruction k It would not be a good idea here to assume the jump (jump on zero, jz) is always taken, since the programmer has here elected to write the program in such a way that it is only taken when the loop is finished. In fact, based on these two examples, we might guess that jnz should always be assumed to be taken, but jz 25 This sequence performs exactly as the previous one. Such coding is sometimes required due to restrictions placed on the operands of conditional branches by a particular architecture. For instance, in Intel architectures the target of a conditional branch must be within 128 bytes of the branch instruction. If the loop is larger than 128 bytes than the second sequence shown must be used instead of the first sequence. NTC 8/22/04 104

8 NTC 8/22/ should always be assumed to be not taken! It turns out that the percentage of time we guess correctly can be improved if we examine the kind of branch and make the decision about which instruction stream to fetch on that, rather than blindly always assume the branch is taken, as was done with strategy 1. The penalty of pipeline benefits can be reduced to about 70% of that without branch prediction. 3. Branch History Table. A Branch History Table (BHT) keeps track of what occurred previously on the execution of each branch instruction. In the simplest case, a single bit can be associated with each kind of branch instruction. If a particular bit is set to 0'than that means that on the previous execution of this instruction the branch had not been taken. We would then guess that the next time the same instruction occurs we should guess to continue fetching the in-line instruction stream. If the branch is taken, then the bit for that branch is set to 1, and the next time that branch occurs we will guess to take it, and start fetching the target instruction stream instead of the in-line instruction stream. In the loop example from strategy 1, for instance, the first time we encounter the jnz instruction we would incorrectly guess to continue with instruction k; however, we would set the bit in the BHT to 1'and the next time the jnz instruction was encountered (the second time through the loop) we would see the bit in the BHT is now set to 1'and guess to fetch instruction 1. We will continue to guess to fetch instruction 1 (correctly) until register R becomes zero, and then we will misguess again. The BHT bit for jnz will be set back to 0'. A refinement of this scheme includes using more than one bit for each branch instruction so that more than just the last instance of the instruction execution can be recorded. With two bits for instance, we can record how many, of the last four times a jnz instruction was encountered, the branch was taken. The above BHT schemes provide significant improvement over strategies 1 and 2, but none of these schemes take into account that the same instruction type can occur in multiple places in a program. In the worst case, it is possible that an additional jnz instruction, for instance, may occur as one of the instructions 1 thru n in the loop, totally destroying the benefits of these schemes. For this reason, it is usual that a BHT not only record the previous actions for each branch instruction, but it also records the addresses associated with the branch instruction. Thus, a different guess could be made for the instruction at the address at the end of the loop than is made for a branch instruction made at an address in the middle of the loop, or someplace else entirely. Further refinements and performance improvements can be achieved by recording both the address

9 of the branch and the target address, along with the previous history. Branch penalty can be reduced to about 50% of the penalty without Branch Prediction using Branch History Tables. 4. Fetch both Instruction Streams Another solution, which attempts to avoid the question of branch prediction altogether, is to simply prefetch both the target and in-line instruction streams. This implies two Instruction Prefetch Buffers, or some way of tagging instructions to indicate which instruction belongs to which stream. This solution, however does not completely solve the branch problem, for the simple reason that either, or both, of the target stream may contain additional branches. If an additional branch is encountered before the previous branch has been resolved, then we would need to start prefetching a third instruction stream, and so on. The benefits of this depend on the distribution of branches in the kinds of programs typically run on the machine under consideration. For instance, if branches do not occur too close together, this solution eliminates 100% of all branch penalties. But this is not the general case. In practice, combinations of options 2, 3 and 4 are all implemented to minimize branch penalties in pipelines. NTC 8/22/04 106

10 Practice Problems 1. Assuming branches are encountered frequently in a given program, order the 4 branch penalty reduction strategies from least to most beneficial. 2. In general, lacking any knowledge of specific program mixes, which branch penalty reduction method is the best? NTC 8/22/04 107

11 Practice Problems (Con t) As an example, suppose you are given the sequence of instructions ADD MPY BRC ADD ADD and are asked to draw the timing diagram for the case of a correctly guessed branch. The correct diagram is shown below. (Note that the branch cannot be resolved until the previous MPY has completed its execution cycle.) ADD MPY D E BRC -- D O -- E P 3 ADD ADD What is the CPI for the sequence shown? (The branch is counted as an instruction.) 2. Draw a similar diagram for the case when the BRC prediction is incorrect (misguessed). Assume a single MPY instruction is in the alternative (target) instruction stream, and it takes one cycle to fetch the new instructions. 3. What is the CPI for the diagram in part b.? (The branch is counted as an instruction.) 4. Consider the following sequence of instructions. These are three address instructions which are otherwise identical to the instructions described above. MPY R1, R6, R7 ADD R8, R3, R2 ADD Rx, Ry, Rz R1R6 x R7 R8R3 + R2 RxRy + Rz a. Draw the timing diagram assuming out-of-order execution is not allowed b. Draw three timing diagrams assuming out-of-order execution is allowed, there are separate multiply and add execution elements, and x, y, and z in the third ADD are actually the following (register) numbers. i. x = 4, y = 5, z=9 ii. x = 4, y = 1, z=9 iii. x = 1, y = 5, z=9 c. Identify the kind of dependency, if any, introduced by the assignments of x, and y for each of i, ii, and iii in part b. NTC 8/22/04 108

12 Out-of-Order Execution In addition to branches, the efficiency and effectiveness of a pipeline machine can be seriously reduced when instructions cannot all be executed in the same number of cycles. Some instructions, such as a multiply (MPY) instruction take much longer than others (SHIFT or ADD, for instance). In some cases, the resources need to execute an instruction are not available. For instance, many machines improve the performance of a multiply instruction by having separate (in addition to the ALU) special-purpose hardware devoted to executing a multiply instruction. Floating Point (FLP)operations are also usually executed with dedicated hardware reserved for this purpose. 26 If a floating point instruction is decoded and the FLP execution unit is in the process of executing a previous FLP instruction, the pipeline is stalled until the FLP unit becomes available. In the diagram below, three instructions are shown, an ADD, two MPY instructions (which take five execute cycles using the ALU) and another ADD. ADD MPY [P4] MPY ADD D O E P In this diagram, the pipeline has stalled after the first MPY instruction has started executing because the ALU is now busy for five cycles instead of just one. There is now a 4 cycle gap, or bubble in the pipeline before the second MPY instruction can execute, and there is an 8 cycle bubble between the operand fetch of the second ADD and its execution. Let s suppose that the ALU is not required for the MPY instructions because there is additional special-purpose hardware available, and this hardware allows a MPY to be executed in just 2 execution cycles. Diagram [P4] now looks like this: ADD MPY [P5] MPY ADD Processors with multiple execution engines, such as floating point units, multiply units, shifters, etc., are referred to as superscalar processors. They are capable of executing multiple instructions simultaneously, one in each special unit. NTC 8/22/04 109

13 The bubbles in the pipeline have been reduced from 4 and 8 to 1 and 2, a 75% reduction in penalty. But notice that while the MPY instructions are executing, the ALU is idle. It would certainly be nice if the second ADD instruction could make use of the ALU without having to wait for the MPYs to finish. We would then have the following picture: ADD MPY [P6] MPY ADD As far as the second ADD is concerned, there is no penalty incurred due to the extra time it takes to do an MPY instruction. We have executed the ADD out of order with respect to the order of the instructions in the program. This is best seen by looking at the order in which the Putaway cycles occur in time: The first ADD s putaway occurs on cycle 4 The first MPY s putaway occurs on cycle 6 The second MPY s putaway occurs on cycle 8 The second ADD s putaway occurs on cycle 7 What do we have to be careful of if we want to be able to perform instructions out of order in this fashion? We need to study the data dependencies of the various instructions. The above diagrams assume that none of the instructions are using resources (registers, memory addresses) used by any other instruction. In reality this is virtually never true; it is in fact likely that each instruction builds on the results of previous instructions. Let s consider three kinds of dependencies between instructions. 1. Data Read Dependence. We cannot execute an instruction if its operands include the results of a previous instruction which has not yet completed. Accessing the operand(s) must be delayed. In the current example, we cannot execute the ADD instruction if its operands include the results of either of the MPYs. In the simplest organization, the O cycle of any instruction cannot occur before the P cycle of any previous instruction which provides results which become operands of the subsequent instruction. (Note that this is true even in the normal pipeline without out-of-order execution. Here is a portion of diagram [P2'] modified to show what happens if the second instruction needs, as NTC 8/22/04 110

14 operands, the result of the first instruction [P7] In this case, the operand fetch can t be done until the P cycle of the previous instruction is complete. In practice, hardware is frequently provided which examines the operand addresses of contiguous instructions and allows the results for the ALU to be fed back immediately into the ALU operand register, resulting in the following diagram, NTC 8/22/ [P8] Data Store Dependence. We cannot putaway the results of an instruction if the result address is the same as the location of the result address of a previous instruction. In the current example if the second MPY result address is the same as the second ADD result address, the contents of that address will be the results of the MPY, not of the ADD. This violates a fundamental rule of computer architecture and design: The programmer (and his/her program) must observe results to occur in the order in which the instructions appear in the program. In practice, this problem is often resolved by allowing the execution of the ADD to proceed, but to defer the putaway until all previous instructions putaways have been completed. This will still allow significant performance improvement. ADD MPY [P5'] MPY ADD Data Store/Read Dependence. We cannot putaway the results of the second ADD if the result address is the same as the location of an operand for a previous instruction that has not yet executed. This is a relatively rare dependency as it requires that an earlier instruction be delayed quite a bit, far enough so that it s operands aren t even accessed before a later

15 instruction has completed execution, something like [P9] In diagram [P9] the second result is being stored in the same location as one of the first instructions operands. It therefore cannot execute as shown but must look like [P9'] Resource Dependency. Two instructions cannot be executed out of order if they require the same resources (the ALU, for instance) at the same time. We can summarize the first three dependencies by illustrating each with a pair of 3- operand address instructions, and showing the relative positions of shared data resources (highlighted): Data Read Dependency: OP Rx, Rw, Rz ( Rx Rw OP Rz) OP Ry, Rx, Rz Data Store Dependency: Read/Store Dependency: OP Rx, Ry, Rz OP Rx, Rv, Rw OP Ry, Rx, Rz OP Rx, Rw, Rz There are other dependencies which can occur which we will not go into here. The existence of dependencies such as those discussed here requires hardware to be implemented which examines the addresses of all operand and result addresses of instructions in the pipeline and allows or prevents out-of-order execution as appropriate, always observing the rule that at the end of the day the program and user should observe results in the order of instructions in the program. NTC 8/22/04 112

16 Practice Problems 1. Calculate the CPI for the following diagrams in the notes. 1. P4 d. P5' 2. P5 e. P7 3. P6 f. P8 2. Consider a CPU design with a four-stage pipeline (Decode, Operand fetch, Execute, and Putaway), a multiply unit as well as an ALU and an architecture with three different instruction lengths, as follows: Multiply (MPY) takes six cycles to process.. The instruction format is MPY Rx, Ry and performs the operation Rx [Rx] x [Ry] and its timing diagram is Add (ADD) takes four cycles to process. Both operands are in GPRs and are made available in one cycle. The instruction format is ADD Rx, Ry and performs the operation Rx [Rx] + [Ry] (Continued ) NTC 8/22/04 113

17 Branch on Condition (BRC) takes two cycles to process. During the execute cycle the results of the previous ALU operation are examined and either the target address or the next instruction is executed, as appropriate. Assume branch prediction of some sort. If the branch prediction is incorrect, assume that the fetch of the next instruction takes one cycle. The instruction format is BC Target and performs the operation PC Target if the condition is met D E F D O (target instruction on a misguessed branch) Consider the following sequence of instructions. These are three address instructions which are otherwise identical to the instructions described above. MPY R1, R6, R7 R1R6 x R7 ADD R8, R3, R2 R8R3 + R2 ADD Rx, Ry, Rz RxRy + Rz Draw the timing diagram assuming out-of-order execution is not allowed 1. Draw three timing diagrams assuming out-of-order execution is allowed, there are separate multiply and add execution elements, and x, y, and z in the third ADD are actually the following (register) numbers. 1. x = 4, y = 5, z=9 2. x = 4, y = 1, z=9 3. x = 1, y = 5, z=9 2. Identify the kind of dependency, if any, introduced by the assignments of x, and y for each of i, ii, and iii in part b. NTC 8/22/04 114

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1 Pipelining COMP375 Computer Architecture and dorganization Parallelism The most common method of making computers faster is to increase parallelism. There are many levels of parallelism Macro Multiple

More information

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 Objectives ---------- 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control, UNIT - 7 Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete Instruction, Multiple Bus Organization, Hard-wired Control, Microprogrammed Control Page 178 UNIT - 7 BASIC PROCESSING

More information

Digital System Design Using Verilog. - Processing Unit Design

Digital System Design Using Verilog. - Processing Unit Design Digital System Design Using Verilog - Processing Unit Design 1.1 CPU BASICS A typical CPU has three major components: (1) Register set, (2) Arithmetic logic unit (ALU), and (3) Control unit (CU) The register

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Chapter 8 Pipelining and Vector Processing 8 1 If the pipeline stages are heterogeneous, the slowest stage determines the flow rate of the entire pipeline. This leads to other stages idling. 8 2 Pipeline

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

CS252 Graduate Computer Architecture Midterm 1 Solutions

CS252 Graduate Computer Architecture Midterm 1 Solutions CS252 Graduate Computer Architecture Midterm 1 Solutions Part A: Branch Prediction (22 Points) Consider a fetch pipeline based on the UltraSparc-III processor (as seen in Lecture 5). In this part, we evaluate

More information

UNIT 3 - Basic Processing Unit

UNIT 3 - Basic Processing Unit UNIT 3 - Basic Processing Unit Overview Instruction Set Processor (ISP) Central Processing Unit (CPU) A typical computing task consists of a series of steps specified by a sequence of machine instructions

More information

Module 4c: Pipelining

Module 4c: Pipelining Module 4c: Pipelining R E F E R E N C E S : S T A L L I N G S, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E M O R R I S M A N O, C O M P U T E R O R G A N I Z A T I O N A N D A

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

CS311 Lecture: Pipelining and Superscalar Architectures

CS311 Lecture: Pipelining and Superscalar Architectures Objectives: CS311 Lecture: Pipelining and Superscalar Architectures Last revised July 10, 2013 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as a result

More information

Class Notes. Dr.C.N.Zhang. Department of Computer Science. University of Regina. Regina, SK, Canada, S4S 0A2

Class Notes. Dr.C.N.Zhang. Department of Computer Science. University of Regina. Regina, SK, Canada, S4S 0A2 Class Notes CS400 Part VI Dr.C.N.Zhang Department of Computer Science University of Regina Regina, SK, Canada, S4S 0A2 C. N. Zhang, CS400 83 VI. CENTRAL PROCESSING UNIT 1 Set 1.1 Addressing Modes and Formats

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

structural RTL for mov ra, rb Answer:- (Page 164) Virtualians Social Network Prepared by: Irfan Khan

structural RTL for mov ra, rb Answer:- (Page 164) Virtualians Social Network  Prepared by: Irfan Khan Solved Subjective Midterm Papers For Preparation of Midterm Exam Two approaches for control unit. Answer:- (Page 150) Additionally, there are two different approaches to the control unit design; it can

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Chapter 9. Pipelining Design Techniques

Chapter 9. Pipelining Design Techniques Chapter 9 Pipelining Design Techniques 9.1 General Concepts Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to be performed in sequence. Each subtask

More information

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts C-2 Appendix C Pipelining: Basic and Intermediate Concepts C.1 Introduction Many readers of this text will have covered the basics of pipelining in another text (such as our more basic text Computer Organization

More information

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version: SISTEMI EMBEDDED Computer Organization Pipelining Federico Baronti Last version: 20160518 Basic Concept of Pipelining Circuit technology and hardware arrangement influence the speed of execution for programs

More information

Module 5 - CPU Design

Module 5 - CPU Design Module 5 - CPU Design Lecture 1 - Introduction to CPU The operation or task that must perform by CPU is: Fetch Instruction: The CPU reads an instruction from memory. Interpret Instruction: The instruction

More information

CS Computer Architecture

CS Computer Architecture CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 An Example Implementation In principle, we could describe the control store in binary, 36 bits per word. We will use a simple symbolic

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

5008: Computer Architecture HW#2

5008: Computer Architecture HW#2 5008: Computer Architecture HW#2 1. We will now support for register-memory ALU operations to the classic five-stage RISC pipeline. To offset this increase in complexity, all memory addressing will be

More information

More advanced CPUs. August 4, Howard Huang 1

More advanced CPUs. August 4, Howard Huang 1 More advanced CPUs In the last two weeks we presented the design of a basic processor. The datapath performs operations on register and memory data. A control unit translates program instructions into

More information

MC9211Computer Organization. Unit 4 Lesson 1 Processor Design

MC9211Computer Organization. Unit 4 Lesson 1 Processor Design MC92Computer Organization Unit 4 Lesson Processor Design Basic Processing Unit Connection Between the Processor and the Memory Memory MAR PC MDR R Control IR R Processo ALU R n- n general purpose registers

More information

Computer Logic II CCE 2010

Computer Logic II CCE 2010 Computer Logic II CCE 2010 Dr. Owen Casha Computer Logic II 1 The Processing Unit Computer Logic II 2 The Processing Unit In its simplest form, a computer has one unit that executes program instructions.

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties Instruction-Level Parallelism Dynamic Branch Prediction CS448 1 Reducing Branch Penalties Last chapter static schemes Move branch calculation earlier in pipeline Static branch prediction Always taken,

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function William Stallings Computer Organization and Architecture 8 th Edition Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data

More information

William Stallings Computer Organization and Architecture

William Stallings Computer Organization and Architecture William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function Rev. 3.2.1 (2005-06) by Enrico Nardelli 11-1 CPU Functions CPU must: Fetch instructions Decode instructions

More information

East Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 3 for Fall Semester, 2005

East Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 3 for Fall Semester, 2005 Points missed: Student's Name: Total score: /100 points East Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 3 for Fall Semester, 2005 Section

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

ECE 341. Lecture # 15

ECE 341. Lecture # 15 ECE 341 Lecture # 15 Instructor: Zeshan Chishti zeshan@ece.pdx.edu November 19, 2014 Portland State University Pipelining Structural Hazards Pipeline Performance Lecture Topics Effects of Stalls and Penalties

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2 Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation CAD for VLSI 2 Pro ject - Superscalar Processor Implementation 1 Superscalar Processor Ob jective: The main objective is to implement a superscalar pipelined processor using Verilog HDL. This project may

More information

3.0 Instruction Set. 3.1 Overview

3.0 Instruction Set. 3.1 Overview 3.0 Instruction Set 3.1 Overview There are 16 different P8 instructions. Research on instruction set usage was the basis for instruction selection. Each instruction has at least two addressing modes, with

More information

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming CS311 Lecture: CPU Control: Hardwired control and Microprogrammed Control Last revised October 18, 2007 Objectives: 1. To explain the concept of a control word 2. To show how control words can be generated

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches

More information

Design of Decode, Control and Associated Datapath Units

Design of Decode, Control and Associated Datapath Units 1 Design of Decode, Control and Associated Datapath Units ECE/CS 3710 - Computer Design Lab Lab 3 - Due Date: Thu Oct 18 I. OVERVIEW In the previous lab, you have designed the ALU and hooked it up with

More information

Updated Exercises by Diana Franklin

Updated Exercises by Diana Franklin C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address

More information

(Refer Slide Time: 00:02:04)

(Refer Slide Time: 00:02:04) Computer Architecture Prof. Anshul Kumar Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture - 27 Pipelined Processor Design: Handling Control Hazards We have been

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

Instruction Set Architecture. "Speaking with the computer"

Instruction Set Architecture. Speaking with the computer Instruction Set Architecture "Speaking with the computer" The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture Digital Design

More information

CS 101, Mock Computer Architecture

CS 101, Mock Computer Architecture CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Lecture1: introduction. Outline: History overview Central processing unite Register set Special purpose address registers Datapath Control unit

Lecture1: introduction. Outline: History overview Central processing unite Register set Special purpose address registers Datapath Control unit Lecture1: introduction Outline: History overview Central processing unite Register set Special purpose address registers Datapath Control unit 1 1. History overview Computer systems have conventionally

More information

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add; CSE 30321 Lecture 13/14 In Class Handout For the sequence of instructions shown below, show how they would progress through the pipeline. For all of these problems: - Stalls are indicated by placing the

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram: The CPU and Memory How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram: 1 Registers A register is a permanent storage location within

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

Chapter 3 : Control Unit

Chapter 3 : Control Unit 3.1 Control Memory Chapter 3 Control Unit The function of the control unit in a digital computer is to initiate sequences of microoperations. When the control signals are generated by hardware using conventional

More information

Instruction-Level Parallelism. Instruction Level Parallelism (ILP)

Instruction-Level Parallelism. Instruction Level Parallelism (ILP) Instruction-Level Parallelism CS448 1 Pipelining Instruction Level Parallelism (ILP) Limited form of ILP Overlapping instructions, these instructions can be evaluated in parallel (to some degree) Pipeline

More information

2 MARKS Q&A 1 KNREDDY UNIT-I

2 MARKS Q&A 1 KNREDDY UNIT-I 2 MARKS Q&A 1 KNREDDY UNIT-I 1. What is bus; list the different types of buses with its function. A group of lines that serves as a connecting path for several devices is called a bus; TYPES: ADDRESS BUS,

More information

Photo David Wright STEVEN R. BAGLEY PIPELINES AND ILP

Photo David Wright   STEVEN R. BAGLEY PIPELINES AND ILP Photo David Wright https://www.flickr.com/photos/dhwright/3312563248 STEVEN R. BAGLEY PIPELINES AND ILP INTRODUCTION Been considering what makes the CPU run at a particular speed Spent the last two weeks

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 9 Pipelining Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Basic Concepts Data Hazards Instruction Hazards Advanced Reliable Systems (ARES) Lab.

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

ECEC 355: Pipelining

ECEC 355: Pipelining ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly

More information

Preventing Stalls: 1

Preventing Stalls: 1 Preventing Stalls: 1 2 PipeLine Pipeline efficiency Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls Ideal pipeline CPI: best possible (1 as n ) Structural hazards:

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Sample Midterm I Questions Israel Koren ECE568/Koren Sample Midterm.1.1 1. The cost of a pipeline can

More information

Introduction to CPU Design

Introduction to CPU Design ١ Introduction to CPU Design Computer Organization & Assembly Language Programming Dr Adnan Gutub aagutub at uqu.edu.sa [Adapted from slides of Dr. Kip Irvine: Assembly Language for Intel-Based Computers]

More information

Pipelining. Parts of these slides are from the support material provided by W. Stallings

Pipelining. Parts of these slides are from the support material provided by W. Stallings Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective To present the Pipelining concept, its limitations and the techniques for performance

More information

Superscalar Processors Ch 14

Superscalar Processors Ch 14 Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency? Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming CPS311 Lecture: CPU Control: Hardwired control and Microprogrammed Control Last revised October 23, 2015 Objectives: 1. To explain the concept of a control word 2. To show how control words can be generated

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

3.3 Hardware Parallel processing

3.3 Hardware Parallel processing Parallel processing is the simultaneous use of more than one CPU to execute a program. Ideally, parallel processing makes a program run faster because there are more CPUs running it. In practice, it is

More information

Major and Minor States

Major and Minor States Major and Minor States We now consider the micro operations and control signals associated with the execution of each instruction in the ISA. The execution of each instruction is divided into three phases.

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Appendix C Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure C.2 The pipeline can be thought of as a series of data paths shifted in time. This shows

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

Superscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency

Superscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency Superscalar Processors Ch 13 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction 1 New dependency for superscalar case? (8) Name dependency (nimiriippuvuus) two use the same

More information

Chapter 20 - Microprogrammed Control (9 th edition)

Chapter 20 - Microprogrammed Control (9 th edition) Chapter 20 - Microprogrammed Control (9 th edition) Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 20 - Microprogrammed Control 1 / 47 Table of Contents I 1 Motivation 2 Basic Concepts

More information

CS433 Homework 2 (Chapter 3)

CS433 Homework 2 (Chapter 3) CS433 Homework 2 (Chapter 3) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies

More information

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming Computer Science 324 Computer Architecture Mount Holyoke College Fall 2007 Topic Notes: Data Paths and Microprogramming We have spent time looking at the MIPS instruction set architecture and building

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015 Advanced Parallel Architecture Lesson 3 Annalisa Massini - 2014/2015 Von Neumann Architecture 2 Summary of the traditional computer architecture: Von Neumann architecture http://williamstallings.com/coa/coa7e.html

More information

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Registers

More information

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines 6.823, L15--1 Branch Prediction & Speculative Execution Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L15--2 Branch Penalties in Modern Pipelines UltraSPARC-III

More information

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17 Short Pipelining Review! ! Readings! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 4.5-4.7!! (Over the next 3-4 lectures)! Lecture 17" Short Pipelining Review! 3! Processor components! Multicore processors and programming! Recap: Pipelining

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information