COMPUTER ORGANIZATION AND DESIGN
|
|
- Evelyn Jeffry Caldwell
- 6 years ago
- Views:
Transcription
1 ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc
2 To understand this chapter, you will need to understand some basic digital logic concepts Earlier, we discussed AND, OR, NOT, and XOR (exclusive OR) logic gates You should go back and review this You should also review our earlier discussion on how RAM works You should review the Boolean logic axioms we discussed We will shortly discuss how multiplexors and registers work This section of slides includes information from Section Digital Logic Introduction Digital Logic Introduction Chapter 4 The Processor 2
3 Logic Design Basics Information in CPU encoded in binary Combinational component Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data is encoded on multi-wire buses Operates on data Output is a function of current inputs i.e. no history Examples are circuits created from AND, OR, Not gates, but without any feedback loops State (sequential) elements Store information (i.e. flip-flops, registers) Chapter 4 The Processor 3
4 2- Input Multiplexor A 2-input multiplexor has two data sources, x1 and x2, and one output, f The third input, s, selects which input is transmitted to the output If s = 0, then f has the same value as x1 If s = 1, then f has the same value as x2 Chapter 4 The Processor 4
5 2- Input Multiplexor II Chapter 4 The Processor 5
6 4- Input Multiplexor For 4 inputs, need two select lines: s and s 0 1 If we wish to select between 32 sources as possible inputs, we would need five select inputs (i.e. 25 = 32) Chapter 4 The Processor 6
7 Multiplexor for 64-Bit Registers A 32-bit multiplexer can select between 32 1-bit sources To select between 32, 64-bit registers, we would need: An array of 32-bit multiplexors One 32-bit multiplexor for each bit of the register (64 muxes in total) i.e. the first 32-bit multiplexor would select between b0 of each of the LEGv8 registers Chapter 4 The Processor 7
8 Combinational Components I(m-1) I1 I0 n S0 n-bit, m x 1 Multiplexor S(log n m) O O= I0 if S=0..00 I1 if S=0..01 I(m-1) if S=1..11 I(log n I0-1) A B n A n log n x n Decoder n-bit Adder O(n-1) O1O0 carry sum n O0 =1 if I=0..00 O1 =1 if I=0..01 O(n-1) =1 if I=1..11 sum = A+B (first n bits) carry = (n+1) th bit of A+B With enable input e all O s are 0 if e=0 With carry-in input Ci sum = A + B + Ci B n n-bit Comparato r less equa greate l r less = 1 if A<B equal =1 if A=B greater=1 if A>B A n B n n bit, m function S0 ALU S(log n m) O O = A op B op determined by S. May have status outputs carry, zero, etc. Chapter 4 The Processor 8
9 Combinational Elements Adder Multiplexor Y = S? I1 : I0 I0 I1 M u x S A + Y = A+ B Y B Arithmetic/Logic Unit (ALU) Y = F(A, B) A ALU Y Y B F Chapter 4 The Processor 9
10 Tri-state Buffer When e = 1, acts like a buffer. i.e. output = input When e = 0, output is electrically disconnected from the input For a shared data bus, each register's outputs would go through a tri-state buffer As long as only one register enables output at a time, there is no conflict This is a more efficient method than multiplexors to have a large number of devices share a common wire Chapter 4 The Processor 10
11 D Flip-Flops The basic building block of a register is a device called a D (data) flip-flop (FF) A positive edge-triggered D flip-flop Stores a single bit of data On rising edge (when signal changes from low-to-high) of the clock: Stores the value at the D input Stored value then appears at the Q output, after a short delay Changes of D input otherwise ignored Can add an enable input to flip-flop When enable set to 1, it behaves as above When enable set to 0, it keeps its current value and ignores clock signal Chapter 4 The Processor 11
12 D Flip-Flops II Devices shown from top are: D latch (ignore) Positive edge-triggered D FF Negative edge-triggered (clock transitions from high-to-low) D FF To correctly store data, value at D input must be constant for period just before and after desired clock edge. Chapter 4 The Processor 12
13 Registers To create an 8-bit register: Group eight D flip-flops together Gives 8 D inputs, and 8 Q outputs Connect their enable signals together Connect their clock inputs together Chapter 4 The Processor 13
14 Registers II Example shows two 2-bit registers (R1 and R2), connected two a 2-bit shared data bus Register i (i in {1,2}) has an input enable signal (Riin) and an output enable signal (Riout) All registers use tri-state buffers to connect to the data bus Chapter 4 The Processor 14
15 Clocking Methodology Most data is stored in state elements such as registers Typical operation in a processor: One state element stores data, which appears at its output This data then propagates through a combinational logic circuit Data then appears at the input of a second state element Chapter 4 The Processor 15
16 Clocking Methodology II This occurs between clock edges: On first rising edge of clock, data is stored in State Element 1 Data propagates through combinational logic (i.e. an adder) Output of combinational logic stored in State Element2 on next rising edge of clock Clock period must be long enough for data to reach State element 2 and be stable Longest delay in processor determines the minimum clock period Chapter 4 The Processor 16
17 Clocking Methodology III Edge triggered elements allow for a state element to be read and written in the same clock cycle Clock period needs to be long enough for output of combinational logic to reach next input The delay going through the combinational logic must be long enough that newly loaded value of state element can't propagate too quickly back to input of state element Input must not change for short period after clock edge Chapter 4 The Processor 17
18 From previous chapters, we saw that CPU performance was determined by: Instruction count Determined by ISA and compiler CPI and Cycle time 4.1 Introduction Chapter Introduction Determined by the implementation of the processor In examining the implementation of the CPU: We will see how the ISA determines many aspects of the CPU design How different implementations affect clock rate and CPI Chapter 4 The Processor 18
19 Chapter Introduction II We will examine two LEGv8 implementations A simplified version A more realistic pipelined version For both we will examine the datapath and controller design We will start with a highly abstract and simplified overview, and then refine the design as we add details Chapter 4 The Processor 19
20 Basic LEGv8 Implementation We will focus our implementation on a subset of LEGv8 instructions to simplify things Will demonstrate the key concepts of datapath and controller Implementation of remaining instructions similar The subset of instructions we will focus on is: Memory reference: Arithmetic/logical: Load register: LDUR X0, [X1,#8] Store register: STUR X0, [X1,#0] ADD, SUB, AND, ORR: i.e. ADD X0, X1, X2 Branching: Compare and Branch: CBZ X0, Label1 Unconditional Branch: B Label2 // will add this last to design Chapter 4 The Processor 20
21 Instruction Execution For all three classes of instructions (memory reference, arithmetic/logical, and branching), the first three steps are identical: 1. Set the program counter (PC) register to the memory location that contains the next instruction 2. Fetch the instruction from memory 3. Read zero, one or two registers: Use the fields of the instruction to select registers LDUR/STUR/CBZ only require one register Most other instructions require two Unconditional branch requires none Chapter 4 The Processor 21
22 Instruction Execution II Next action required depends on instruction type Except for unconditional branches, instructions next need to use the ALU Memory reference: need address calculation Arithmetic/logical: perform indicated operation Conditional branch: needs comparison to zero After ALU, next actions are: Memory reference: access memory to read data or write data Arithmetic/logical/load: store output of ALU or data from memory to a register Chapter 4 The Processor 22
23 Instruction Execution III Finally, we need to determine the address of the next instruction to execute If current instruction is a conditional branch and specified register was zero: Load new address (PC plus offset specified as part of instruction) into PC register Otherwise, load PC + 4 into PC register Chapter 4 The Processor 23
24 CPU Overview Provides an abstract and simplified view Omits two important details Chapter 4 The Processor 24
25 (1) Need Multiplexors Can t just join wires together Use multiplexors Chapter 4 The Processor 25
26 (2) Needs Control Signals Chapter 4 The Processor 26
27 Logic Design Conventions A signal is asserted if it is logically high We assert a signal when it should be set to logically high A signal is deasserted if it is logically low We deassert a signal when it should be set to logically low In textbook, we will only store data on the rising clock edge for our flip-flops and registers The data bus signals are assumed to be 64 bit unless specified otherwise Two overlapping signals are not connected unless there is a dot where they cross each other Chapter 4 The Processor 27
28 We will now examine the major components of a datapath needed to execute each class of a LEGv8 instruction A register file is a state element that consists of a set of registers that can be read or written by supplying the register number to be accessed 4.3 Building a Datapath Building a Datapath A datapath element is a unit used to operate on or hold data within the processor For LEGv8 we have: Instruction and data memories, register file, the ALU, and adders We will build a LEGv8 datapath incrementally Refining the overview design Chapter 4 The Processor 28
29 Instruction Fetch 64-bit register Increment by 4 for next instruction Chapter 4 The Processor 29
30 R-Format Instructions opcode Rm shamt Rn Rd 11 bits 5 bits 6 bits 5 bits 5 bits Need to read two register operands Perform arithmetic/logical operation Write result to a register Uses a register file and ALU Chapter 4 The Processor 30
31 R-Format Instructions II Register file requires: three 5-bit selection inputs to specify the two source registers, and the destination register One 64-bit input to load data to be written to the destination register Two 64-bit outputs Must assert RegWrite input to write to register on next clock edge Chapter 4 The Processor 31
32 R-Format Instructions III ALU has: two 64-bit inputs for operands and a 64 bit output A 4-bit input to select which function to perform A 1-bit output that is asserted when the result of the operation is zero Chapter 4 The Processor 32
33 Load/Store Instructions opcode address op2 Rn Rt 11 bits 9 bits 2 bits 5 bits 5 bits To perform load/store/ we need to: Read base address from Rn register Add 9-bit signed offset to base address to get data address If load: Read memory and update register If store: Write register value to memory Chapter 4 The Processor 33
34 Load/Store Instructions II To perform these instructions, we will need a register file and ALU: Register file provides base address and source/destination register ALU will add base address and address offset We will also need a sign extension unit: Unit takes the 32-bit instruction word as input For a load/store, it extracts a 9-bit offset For CBZ, it will extract a 19-bit offset Chapter 4 The Processor 34
35 Load/Store Instructions III We also need a data memory unit to read from and write to Data memory unit has: 64-bit address input to select memory location 64-bit input for write operations 64-bit output for read operations Input MemWrite is asserted to enable writing Input MemRead is asserted to specify a read operation Chapter 4 The Processor 35
36 Branch Instruction (CBZ) opcode Offset Rn 8 bits 19 bits 5 bits Requires register to test for Zero, and signed address offset Register is passed through ALU to output, and the ALU's zero output is set based upon this value We also need to calculate the branch target address which is the address to load into the PC register if the branch is taken (i.e. the register is equal zero) To calculate branch target address: Sign-extend displacement to 64 bits Shift displacement left 2 places to multiply by 4 (displacement is how many words to jump, but each word is 4 bytes) Add to current value of PC register (which is the address of the branch instruction) Chapter 4 The Processor 36
37 Branch Instruction II Shows segment of datapath that handles branches Adds a dedicated adder unit to calculate the branch target address Note: Figure 4.9 of the text incorrectly used PC +4 instead of PC in the top adder Chapter 4 The Processor 37
38 Composing the Elements Our first datapath version will execute one instruction in each clock cycle This means: Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Also need two dedicated adders for calculating next instruction address Use multiplexers where alternate data sources are used for different instructions Add needed control signals Chapter 4 The Processor 38
39 R-Type/Load/Store Datapath They can share register file and ALU, but: Bottom ALU input needs to choose between offset and register Destination register input needs to choose between ALU output and data memory Chapter 4 The Processor 39
40 Full Datapath Still not using ALU zero output to determine PCSrc mux input Chapter 4 The Processor 40
41 We will now look at the design of the control unit To simplify the design of the main control unit, we will first design a simple controller for our ALU Our ALU offers six functions that we need To specify the desired function, we need to specify the value of the ALU's four control bits 4.4 A Simple Implementation Scheme ALU Control Chapter 4 The Processor 41
42 ALU Control II Table below shows the available functions and the corresponding 4-bit selection patterns Below shows which function the ALU is used for when executing the indicated instruction type Load/Store: need to add Base register to offset CBZ Branch: need to pass register to output R-type: operation depends on opcode ALU control Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 pass input b 1100 NOR Chapter 4 The Processor 42
43 ALU Control III We assume the main control unit will provide 2-bit ALUOp signal that is derived from the opcode This will determine ALU control for all but the R-type instructions We use X to indicate a don't care condition opcode ALUO p Operation Opcode field ALU function ALU control LDUR 00 load register XXXXXXXXXX X add 0010 STUR 00 store register XXXXXXXXXX X add 0010 CBZ 01 compare and branch on zero XXXXXXXXXX X pass input b 0111 R-type 10 add add 0010 subtract subtract 0110 AND AND 0000 ORR OR 0001 Chapter 4 The Processor 43
44 ALU Control Truth Table Next step is to create a truth table Can then use standard methods to derive a Boolean logic expression for each bit of ALU Control signal Each Boolean expression implemented as a combinatorial circuit Used don't cares to minimize displayed rows Chapter 4 The Processor 44
45 The Main Control Unit We will create the main Control Unit We note that our control signals will be derived from our 32 bit instruction word Note: figure below as well as Fig 4.14 in text are incorrect. Opcode for conditional branch is 31:24, NOT 31:26. Chapter 4 The Processor 45
46 The Main Control Unit II We make the following important observations: Opcodes are found in bits 31:21 First register operand is bits 9:5 (Rn) for R-type and base register of load/store 2nd register operand is bits 20:16 (Rm) for R-type, but at 4:0 (Rt) for a store operation and register for testing on CBZ Means we require a multiplexor to make selection Destination register for R-type and load operations is bits 4:0 Chapter 4 The Processor 46
47 Datapath With Control Unit Chapter 4 The Processor 47
48 Logic for Control Signals Read Figure 4.16 in text for a good description of the purpose and operation of main control signals The value of the control signals determined by the instructions opcode alone Read description for Figure 4.18 in text Chapter 4 The Processor 48
49 Truth Table For Control Signals Use standard methods to derive Boolean logic expressions for each control signal Implement each expression as combinatorial circuit Used don't cares to minimize size Chapter 4 The Processor 49
50 Operation of R-type Instruction Consider operation of datapath for: ADD X1, X2,X3 Happens in one clock cycle, but steps ordered by flow of information 1. On rising clock edge, new instruction address is loaded into PC register 2. The instruction at this address is loaded 3. Registers X2 and X3 are read from register file, while control unit (then ALU control) sets the control signals 4. ALU does specified operation on data read from register file (ADD, SUB, AND, ORR) 5. Output of ALU directed to Write data input of register file (X1) 6. On next rising edge of clock, data saved to X1, while PC register will be loaded with next instruction address (PC + 4), and process repeats Chapter 4 The Processor 50
51 R-Type Instruction Chapter 4 The Processor 51
52 Operation of Load Instruction Consider operation of datapath for: LDUR X1, [X2,offset] 1. On rising clock edge, new instruction address is loaded into PC register 2. The instruction at this address is loaded 3. Registers X2 is read from register file, while control unit (then ALU control) sets the control signals 4. ALU computes the sum of X2 and the sign-extended address offset 5. Output of ALU used for address of Data memory (MemRead asserted) 6. On next rising edge of clock, output of data memory saved to X1, while PC register will be loaded with next instruction address (PC + 4), and process repeats Chapter 4 The Processor 52
53 Load Instruction For store operation, read description of Figure 4.20 in text Chapter 4 The Processor 53
54 Operation of CBZ Instruction Consider operation of datapath for: CBZ X1, offset 1. On rising clock edge, new instruction address is loaded into PC register 2. The instruction at this address is loaded 3. Registers X1 is read from register file (Read Reg 2) using I[4:0], while control unit (then ALU control) sets the control signals 4. ALU passes value of Read data 2 to output, setting signal zero = 1 if result = 0, sets zero = 0 otherwise 5. Value of PC is added to sign extended offset shifted left by 2 (i.e. branch target address) 6. On next rising edge of clock, PC register loaded with branch target address if zero asserted, otherwise loaded with next instruction address (PC + 4), and process repeats Chapter 4 The Processor 54
55 CBZ Instruction Chapter 4 The Processor 55
56 Implementing Unconditional Branch B 5 Offset 31:26 25:0 Like CBZ, but 26 bit offset, and always branches Branch target calculated by adding PC to sign-extended offset shifted left by 2 Need to add to control unit a new output called Uncondbranch that is only true when bits 31:26 of instruction equal 5 (i.e. B instruction) Need to add an OR gate for select input of top right multiplexor Need to extend sign-extend unit to be able to also select the 26 bit offset from a B instruction Chapter 4 The Processor 56
57 Datapath With B Added Chapter 4 The Processor 57
58 Performance Issues Single-cycle design will work correctly, but too inefficient for modern designs Every instruction must have same clock period Means longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipelining Chapter 4 The Processor 58
59 Consider the steps needed to process multiple loads of laundry with your roommate 1. Place load in washer 2. Move load to dryer 3. Place dry load on table and fold 4. Have roommate put clothes away 5. Go to step 1 until all loads finished 4.5 An Overview of Pipelining Pipelining Analogy Not a very time efficient approach as doesn't take into account that steps 1-4 can be done in parallel Chapter 4 The Processor 59
60 Pipelining Analogy II Assuming that we have the resources that each step can be done at same time, we can overlap execution For example, immediately start next load in washer after we move wet clothes to dryer Pipelining is a technique where multiple steps (called stages) are operated concurrently For Laundry, after the first load was finished, all four stages would be active, but on different loads Chapter 4 The Processor 60
61 Pipelining Analogy III For sequential version, each load takes 2 hours For pipelined version, first load takes 2 hours, but each additional load takes only an additional 30 minutes! Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup = 2n/(0.5n + 1.5) 4 = number of stages As n gets large, speedup tends to number of stages Chapter 4 The Processor 61
62 Pipelining Analogy IV The time spent per stage (cycle time) is time required for longest stage Laundry must be past forward by all stages at same time We go from a cycle time of 2 hours, to that of 30 minutes After initial latency to complete first load, completion time for each load is that of new cycle time We have not changed latency, but improved throughput Chapter 4 The Processor 62
63 LEGv8 Pipeline We can now apply this concept to instruction execution LEGv8 requires five steps to execute instructions IF: Fetch instruction from memory ID: Decode instruction and read register EX: Execute operation or calculate address MEM: Access operand in data memory (if needed) WB: Write result back to register (if needed) Each step becomes one stage of the instruction pipeline Clock period now only needs to be long enough for slowest stage to complete Chapter 4 The Processor 63
64 Pipeline Performance Assume elapsed times for execution stages are: 100ps for register read or write 200ps for other stages Elapsed time for each instruction class shown in table below Instr Instr fetch Register read ALU op Memory access Register write Total time LDUR 200ps 100 ps 200ps 200ps 100 ps 800ps STUR 200ps 100 ps 200ps 200ps R-format 200ps 100 ps 200ps CBZ 200ps 100 ps 200ps 700ps 100 ps 600ps 500ps Chapter 4 The Processor 64
65 Pipeline Performance II Single-cycle (Tc= 800ps) Pipelined (Tc= 200ps) Chapter 4 The Processor 65
66 Pipeline Speedup If we execute 4 instructions, this takes 800 x 4 = 3200 for single-cycle version For pipelined: x 3 = 1400 Speedup is 3200/1400 = 2.29 Reason is that with only 4 instructions, the first one dominates. For a program executing billions of instructions, this should tend to a speedup of 5, the number of stages of the pipelines Assuming that the execution time of each stage is about the same Chapter 4 The Processor 66
67 Pipeline Speedup II If all stages are balanced i.e., all take the same time Time between instructionspipelined = Time between instructionsnonpipelined Number of stages If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease Chapter 4 The Processor 67
68 Pipelining and ISA Design LEGv8 ISA was designed for pipelining All instructions are 32-bits Few and regular instruction formats Easier to fetch in one stage and decode in next Compare with x86: 1- to 15-byte instructions Can decode and read registers in one step Load/store addressing Memory operands only appear here Frees up ALU to calculate addressing in 3rd stage of pipeline, and access memory in 4th stage Compare with x86 that can perform operations (i.e. ADD) on operands in memory would need extra stage! Chapter 4 The Processor 68
69 Hazards Hazards are situations when the next instruction cannot execute in the following clock cycle There are three types: A structural hazard is when a required resource is needed at the same time by more than one instruction A data hazard is when an instruction can not execute a stage because it is waiting on data from an earlier instruction i.e. waiting for a result to be written to destination register A control hazard occurs when the instruction that was fetched was not the one needed i.e. don't know which instruction is need after a conditional branch until the branch is evaluated Chapter 4 The Processor 69
70 Structural Hazards This is when different instructions have a conflict for simultaneous use of a resource If LEGv8 pipeline used a single memory we could get a structural hazard Both load and store require data access If tried to also do an instruction fetch at same time, pipeline would have to stall for that cycle Would cause a pipeline bubble Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches Chapter 4 The Processor 70
71 Data Hazards Occurs when an instruction depends on completion of data access by a previous instruction ADD SUB X19, X0, X1 X2, X19, X3 The result of the ADD will not be written to X19 in time to be read for the SUB operation, causing delay Chapter 4 The Processor 71
72 Forwarding (aka Bypassing) In forwarding, we use result when it is computed Don t wait for it to be stored in a register Requires extra connections in the datapath Chapter 4 The Processor 72
73 Load-Use Data Hazard Can t always avoid stalls by forwarding Consider a data load followed by a SUB instruction If value not ready when needed, we can t forward backward in time! Even with forwarding, data won't be retrieved until one cycle too late When we needed to delay the pipeline by a cycle or more, we call this a pipeline stall Chapter 4 The Processor 73
74 Code Scheduling to Avoid Stalls To avoid a stall, the compiler can often help It can reorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; stall stall LDUR LDUR ADD STUR LDUR ADD STUR X1, X2, X3, X3, X4, X5, X5, 13 cycles [X0,#0] [X0,#8] X1, X2 [X0,#24] [X0,#16] X1, X4 [X0,#32] LDUR LDUR LDUR ADD STUR ADD STUR X1, X2, X4, X3, X3, X5, X5, [X0,#0] [X0,#8] [X0,#16] X1, X2 [X0,#24] X1, X4 [X0,#32] 11 cycles Chapter 4 The Processor 74
75 Control Hazards Control Hazards are caused by branch instructions Conditional branches determine whether we execute the next instruction or the one at the branch target We don't know which until after the condition is evaluated Can try to guess which instruction is next Wrong guess means flushing the pipeline and loading the correct instruction In LEGv8 pipeline Need to compare registers and compute target early in the pipeline Add extra hardware to do it in ID stage Still causes a one cycle delay Chapter 4 The Processor 75
76 Stall on Branch One solution to deal with control hazards: Wait until branch outcome determined before fetching next instruction For LEGv8, adds an extra cycle Chapter 4 The Processor 76
77 Branch Prediction Long pipelines can t readily determine branch outcome early Better option is to predict outcome of branch Stall penalty becomes unacceptable Only stall if prediction is wrong In LEGv8 pipeline We can assume (predict) a branch will never be taken Fetch instruction after branch, with no delay If branch is actually taken, flush pipeline, and load branch target instruction Chapter 4 The Processor 77
78 More-Realistic Branch Prediction 1. Static branch prediction Based on typical branch behavior Example: for loop and if-statement branches Always predict backward branches taken Always predict forward branches not taken 2. Dynamic branch prediction Hardware measures actual branch behavior e.g., record recent history of each branch Assume future behavior will continue the trend When wrong, stall while re-fetching, and update history Chapter 4 The Processor 78
79 Pipeline Summary The BIG Picture Pipelining improves performance by increasing instruction throughput Subject to hazards Executes multiple instructions in parallel Each instruction has the same latency Parallelism achieved without requiring action by programmer Structural, data, control Instruction set design affects complexity of pipeline implementation Chapter 4 The Processor 79
80 4.6 Pipelined Datapath and Control LEGv8 Pipelined Datapath Instructions and data mostly move from left to right Exceptions: branch target and register writes Chapter 4 The Processor 80
81 Pipeline registers Once a pipeline is full, each stage is processing part of a different instruction This means each stage needs: To be independent of previous stages To keep a copy of the instruction information that either it needs, or that future stages will need We require registers between stages They will hold needed information produced in previous cycles Chapter 4 The Processor 81
82 Pipeline registers II Figure shows pipeline registers (highlighted) added between stages PC register and pipeline registers assumed to store information each cycle, so no write enable needed Chapter 4 The Processor 82
83 Pipeline Operation We will first examine the flow of data through the pipeline Will focus on what data needs to be saved in the pipeline registers at each stage We will then examine the flow of control signals through the pipeline We will first look at a single-clock-cycle pipeline diagram Shows pipeline usage for a single cycle at a time Highlights resources used We ll look at single-clock-cycle diagrams for load & store Chapter 4 The Processor 83
84 IF Stage for Load, Store, Diagram shows which resources (blue) are being used by current stage On rising edge of the clock, PC register loads new instruction address This progresses to address selection for Instruction memory Selected 32 bit instruction appears at data port of memory Instruction and 64 bit PC register content arrive at input of IF/ID Value of PC+4 computed and fed to multiplexor for input of PC register Chapter 4 The Processor 84
85 IF stage for Load, Store, II Chapter 4 The Processor 85
86 ID for Load, Store, On rising edge of the clock, the current instruction and PC register contents from previous stage is saved to the IF/ID register The new instruction stored in the IF/ID register is then decoded by the main controller to generate control signals The instruction stored in IF/ID is used to select read/write registers The instruction is also used by the sign-extension unit The stored PC value, the output of the two read registers, and the sign-extended offset arrive at input of ID/EX register We need to store any data in the next pipeline register that may be needed by a later stage Chapter 4 The Processor 86
87 ID for Load, Store, II Chapter 4 The Processor 87
88 EX for Load On rising edge of the clock, the stored PC value, the output of the two read registers, and the sign-extended offset are stored in the ID/EX register The new output (PC register and offset) of the ID/EX register is used to calculate a branch target address in case it is needed The stored offset and the stored Read data 1 are used by the ALU to calculate the desired data memory location The branch target address, the output of the ALU (zero and operation result), and the stored value from Read data 2 arrive at input of EX/MEM register Chapter 4 The Processor 88
89 EX for Load II Chapter 4 The Processor 89
90 MEM for Load On rising edge of the clock, branch target address, the output of the ALU (zero and operation result), and the stored value from Read data 2 are stored in the EX/MEM register The new output of the EX/MEM register is used to supply the address and write data inputs to the data memory unit The stored branch target is fed back to the input multiplexor for the PC register The stored output of the ALU (operation result), and the Read data output of the data memory arrive at input of MEM/WB register Chapter 4 The Processor 90
91 MEM for Load II Chapter 4 The Processor 91
92 WB for Load On rising edge of the clock, stored output of the ALU (operation result), and the Read data output of the data memory are stored in the MEM/WB register The new output of the EX/MEM register is connected to the two inputs of the multiplexor that is fed back to the Write data input of the register file The output of the multiplexor arrives at the Write data input of the register file On the next rising edge, this date will be stored in the selected write register of the register file Chapter 4 The Processor 92
93 WB for Load (with error) II Wrong register number We see now that we have made a design error When we feed back data to be written to write register, we are using the select bits from the wrong instruction! Chapter 4 The Processor 93
94 Corrected Datapath for Load During the Instruction decode stage of our load instruction, we need to save the write register select bits to our pipeline registers Then feed back the saved select bits with the data to be written Chapter 4 The Processor 94
95 MEM for Store For a store operation, the memory stage is the last active stage Still have to go through write-back stage as later instructions already progressing at maximum rate Chapter 4 The Processor 95
96 Multi-Cycle Pipeline Diagram Form showing resource usage for each instruction as it progresses over time Chapter 4 The Processor 96
97 Multi-Cycle Pipeline Diagram II More traditional form of this type of diagram Chapter 4 The Processor 97
98 Single-Cycle Pipeline Diagram State of a pipeline in a single given cycle Chapter 4 The Processor 98
99 Pipelined Control (Simplified) We now need to add a controller We start by adding control signals to the existing design Chapter 4 The Processor 99
100 Pipelined Control Design We first note that design ignores the data and control hazards we discussed in Section 4.5 We will borrow as much as we can from Singlecycle design We will keep the same Main and ALU controller, branch logic, control signals, and the same multiplexor design The Main controller will create its control signals during the ID stage, and then we will pass the needed control signals forward via the pipeline registers Chapter 4 The Processor 100
101 Pipelined Control Design II As PC and pipeline registers are written on each clock cycle, they don't need separate write signals As each control signal is associated with a component active only during a signal stage, we can associate each signal to a stage IF: nothing needed here ID: need to set Reg2Loc EX: need to set ALUOp and ALUSrc MEM: need to set Branch, MemRead, and MemWrite WB: need to set MemToReg and RegWrite Chapter 4 The Processor 101
102 Pipelined Control Propagation Control signals are derived from instruction during ID stage Signals needed for ID stage kept local The rest are saved into pipeline registers and passed forward to the stage they are needed Chapter 4 The Processor 102
103 Pipelined Control Next slide shows full design, including pipelined control signals and which stage they are used in As Instruction bits 31:21 are used by ALU Control unit in EX stage, we must add these to the ID/EX register Chapter 4 The Processor 103
104 Pipelined Control II Chapter 4 The Processor 104
105 Read for Own Interest Read Sections 4.7, 4.8, 4.9, 4.10 for your own interest Chapter 4 The Processor 105
106 Read On Your Own Read Sections 4.14, and 4.15 on your own Chapter 4 The Processor 106
COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationProcessor (I) - datapath & control. Hwansoo Han
Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationChapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations
Chapter 4 The Processor Part I Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations
More informationCOMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: A Based on P&H Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationThe Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
The Processor (1) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationThe Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
The Processor: Datapath and Control Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Introduction CPU performance factors Instruction count Determined
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationSystems Architecture
Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or all figures from Computer Organization and Design: The Hardware/Software
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationChapter 4. The Processor. Computer Architecture and IC Design Lab
Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS
More informationProcessor (II) - pipelining. Hwansoo Han
Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined
More informationChapter 4. The Processor Designing the datapath
Chapter 4 The Processor Designing the datapath Introduction CPU performance determined by Instruction Count Clock Cycles per Instruction (CPI) and Cycle time Determined by Instruction Set Architecure (ISA)
More informationECE260: Fundamentals of Computer Engineering
Datapath for a Simplified Processor James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy Introduction
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationCMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining Prof. Yanjing Li University of Chicago Administrative Stuff! Lab1 due at 11:59pm today! Lab2 out " Pipeline ARM simulator "
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCSEE 3827: Fundamentals of Computer Systems
CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected
More informationChapter 4. The Processor
Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA
More informationChapter 4. The Processor
Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationECE260: Fundamentals of Computer Engineering
Pipelining James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy What is Pipelining? Pipelining
More informationChapter 5: The Processor: Datapath and Control
Chapter 5: The Processor: Datapath and Control Overview Logic Design Conventions Building a Datapath and Control Unit Different Implementations of MIPS instruction set A simple implementation of a processor
More informationCOMP2611: Computer Organization. The Pipelined Processor
COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among
More informationEIE/ENE 334 Microprocessors
EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
More informationLecture 15: Pipelining. Spring 2018 Jason Tang
Lecture 15: Pipelining Spring 2018 Jason Tang 1 Topics Overview of pipelining Pipeline performance Pipeline hazards 2 Sequential Laundry 6 PM 7 8 9 10 11 Midnight Time T a s k O r d e r A B C D 30 40 20
More informationCSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content
3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design
More information4.1.3 [10] < 4.3>Which resources (blocks) produce no output for this instruction? Which resources produce output that is not used?
2.10 [20] < 2.2, 2.5> For each LEGv8 instruction in Exercise 2.9 (copied below), show the value of the opcode (Op), source register (Rn), and target register (Rd or Rt) fields. For the I-type instructions,
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationTopic #6. Processor Design
Topic #6 Processor Design Major Goals! To present the single-cycle implementation and to develop the student's understanding of combinational and clocked sequential circuits and the relationship between
More informationFull Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI
CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationIntroduction. Datapath Basics
Introduction CPU performance factors - Instruction count; determined by ISA and compiler - CPI and Cycle time; determined by CPU hardware 1 We will examine a simplified MIPS implementation in this course
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationLECTURE 5. Single-Cycle Datapath and Control
LECTURE 5 Single-Cycle Datapath and Control PROCESSORS In lecture 1, we reminded ourselves that the datapath and control are the two components that come together to be collectively known as the processor.
More informationTDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design
1 TDT4255 Computer Design Lecture 4 Magnus Jahre 2 Outline Chapter 4.1 to 4.4 A Multi-cycle Processor Appendix D 3 Chapter 4 The Processor Acknowledgement: Slides are adapted from Morgan Kaufmann companion
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19
CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be
More informationCPU Organization (Design)
ISA Requirements CPU Organization (Design) Datapath Design: Capabilities & performance characteristics of principal Functional Units (FUs) needed by ISA instructions (e.g., Registers, ALU, Shifters, Logic
More informationChapter 4 (Part II) Sequential Laundry
Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationThe Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
The Processor (3) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationOutline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception
Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control
ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationPipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationLecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.
Lecture 4: Review of MIPS Instruction formats, impl. of control and datapath, pipelined impl. 1 MIPS Instruction Types Data transfer: Load and store Integer arithmetic/logic Floating point arithmetic Control
More informationThe MIPS Processor Datapath
The MIPS Processor Datapath Module Outline MIPS datapath implementation Register File, Instruction memory, Data memory Instruction interpretation and execution. Combinational control Assignment: Datapath
More informationCS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015
CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 3, 2015 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if
More informationECE260: Fundamentals of Computer Engineering
ECE260: Fundamentals of Computer Engineering Pipelined Datapath and Control James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania ECE260: Fundamentals of Computer Engineering
More information5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 5 th Edition Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined
More informationIntroduction. Chapter 4. Instruction Execution. CPU Overview. University of the District of Columbia 30 September, Chapter 4 The Processor 1
Chapter 4 The Processor Introduction CPU performance factors Instruction count etermined by IS and compiler CPI and Cycle time etermined by CPU hardware We will examine two MIPS implementations simplified
More informationCENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu
CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified
More informationPipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...
CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100
More informationInstruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31
4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor
More informationSingle Cycle Datapath
Single Cycle atapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili Section 4.-4.4 Appendices B.7, B.8, B.,.2 Practice Problems:, 4, 6, 9 ing (2) Introduction We will examine two MIPS implementations
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationECE369. Chapter 5 ECE369
Chapter 5 1 State Elements Unclocked vs. Clocked Clocks used in synchronous logic Clocks are needed in sequential logic to decide when an element that contains state should be updated. State element 1
More informationSingle Cycle Datapath
Single Cycle atapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili Section 4.1-4.4 Appendices B.3, B.7, B.8, B.11,.2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7-11 in
More informationComputer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming
Computer Science 324 Computer Architecture Mount Holyoke College Fall 2007 Topic Notes: Data Paths and Microprogramming We have spent time looking at the MIPS instruction set architecture and building
More informationCS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016
CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 2, 2016 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if
More informationECE 154A Introduction to. Fall 2012
ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 10 Floating point review Pipelined design IEEE Floating Point Format single: 8 bits double: 11 bits single: 23 bits double:
More informationChapter 4 The Processor
Chapter 4 The Processor 4.1 Introduction 4.2 Logic Design Conventions 4.3 The Single-Cycle Design 4.4 The Pipelined Design (c) Kevin R. Burger :: Computer Science & Engineering :: Arizona State University
More informationInf2C - Computer Systems Lecture Processor Design Single Cycle
Inf2C - Computer Systems Lecture 10-11 Processor Design Single Cycle Boris Grot School of Informatics University of Edinburgh Previous lectures Combinational circuits Combinations of gates (INV, AND, OR,
More informationLecture 12: Single-Cycle Control Unit. Spring 2018 Jason Tang
Lecture 12: Single-Cycle Control Unit Spring 2018 Jason Tang 1 Topics Control unit design Single cycle processor Control unit circuit implementation 2 Computer Organization Computer Processor Memory Devices
More informationECEC 355: Pipelining
ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly
More informationPipelining. Maurizio Palesi
* Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationCPE 335 Computer Organization. Basic MIPS Architecture Part I
CPE 335 Computer Organization Basic MIPS Architecture Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s8/index.html CPE232 Basic MIPS Architecture
More informationComputer Architecture V Fall Practice Exam Questions
Computer Architecture V22.0436 Fall 2002 Practice Exam Questions These are practice exam questions for the material covered since the mid-term exam. Please note that the final exam is cumulative. See the
More informationECS 154B Computer Architecture II Spring 2009
ECS 154B Computer Architecture II Spring 2009 Pipelining Datapath and Control 6.2-6.3 Partially adapted from slides by Mary Jane Irwin, Penn State And Kurtis Kredo, UCD Pipelined CPU Break execution into
More information4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?
Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide
More informationMark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control
EE 37 Unit Single-Cycle CPU path and Control CPU Organization Scope We will build a CPU to implement our subset of the MIPS ISA Memory Reference Instructions: Load Word (LW) Store Word (SW) Arithmetic
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationMajor CPU Design Steps
Datapath Major CPU Design Steps. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required datapath components and how they are connected
More informationCENG 3420 Lecture 06: Pipeline
CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2019 Outline q Pipeline Motivations q Pipeline Hazards q Exceptions q Background: Flip-Flop Control Signals CENG3420 L06.2
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationUniversity of Jordan Computer Engineering Department CPE439: Computer Design Lab
University of Jordan Computer Engineering Department CPE439: Computer Design Lab Experiment : Introduction to Verilogger Pro Objective: The objective of this experiment is to introduce the student to the
More information