4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

Size: px

Start display at page:

Download "4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1"

Brendan Johnson
5 years ago
Views:

1 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage to Describe and odel a Pipeline and ore Pipelining Illstrations Th is online section covers hardware description langages and then gives a dozen eamples of pipeline diagrams, starting on page 366.e8. As mentioned in Appi A, Verilog can describe processors for simlation or with the intention that the Verilog specification be synthesized. To achieve acceptable synthesis reslts in size and speed, and a behavioral specification inted for synthesis mst careflly delineate the highly combinational portions of the design, sch as a path, from the control. The path can then be synthesized sing available libraries. A Verilog specification inted for synthesis is sally longer and more comple. We start with a behavioral model of the five-stage pipeline. To illstrate the dichotomy between behavioral and synthesizable designs, we then give two Verilog descriptions of a mltiple-cycle-per-instrction RISC-V processor: one inted solely for simlations and one sitable for synthesis. Using Verilog for Behavioral Specification with Simlation for the Five-Stage Pipeline Figre e.3. shows a Verilog behavioral description of the pipeline that handles instrctions as well as loads and stores. It does not accommodate branches (even incorrectly!), which we postpone inclding ntil later in the chapter. Becase Verilog lacks the ability to define s with named fields sch as strctres in C, we se several indepent s for each pipeline. We name these s with a prefi sing the same convention; hence, IFIDIR is the IR portion of the IFID pipeline. Th is version is a behavioral description not inted for synthesis. s take the same nmber of clock cycles as or hardware design, bt the control is done in a simpler fashion by repeatedly decoding fields of the instrction in each pipe stage. Becase of this difference, the instrction (IR) is needed throghot the pipeline, and the entire IR is passed from pipe stage to pipe stage. As yo read the Verilog descriptions in this chapter, remember that the actions in the always block all occr in parallel on every clock cycle. Since there are no blocking assignments, the order of the events within the always block is arbitrary.

2 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage modle RISCVCPU (clock); // opcodes parameter LD = 7'b_, SD = 7'b_, BEQ = 7'b_, NOP = 3'h_3, op = 7'b_; inpt clock; reg [63:], Regs[:3], IDA, IDB, EB, EOt, EVale; reg [3:] Iemory[:3], Demory[:3], // separate memories IFIDIR, IDIR, EIR, EIR; // pipeline s wire [:] IFIDrs, IFIDrs, Erd; // Access fields wire [6:] IDop, Eop, Eop; // Access opcodes wire [63:] Ain, Bin; // the inpts // These assignments define fields from the pipeline s assign IFIDrs = IFIDIR[9:5]; // rs field assign IFIDrs = IFIDIR[:]; // rs field assign IDop = IDIR[6:]; // the opcode assign Eop = EIR[6:]; // the opcode assign Eop = EIR[6:]; // the opcode assign Erd = EIR[:7]; // rd field // Inpts to the come directly from the pipeline s assign Ain = IDA; assign Bin = IDB; integer i; // sed to initialize s initial begin = ; IFIDIR = NOP; IDIR = NOP; EIR = NOP; EIR = NOP; // pt NOPs in pipeline s for (i=;i<=3;i=i+) Regs[i] = i; // initialize s--jst so they aren't cares // Remember that ALL these actions happen every pipe stage and with the se of <= they happen in parallel! clock) begin // first instrction in the pipeline is being fetched // Fetch & increment IFIDIR <= Iemory[ >> ]; <= + ; // second instrction in pipeline is fetching s IDA <= Regs[IFIDrs]; IDB <= Regs[IFIDrs]; // get two s IDIR <= IFIDIR; // pass along IR--can happen anywhere, since this affects net stage only! // third instrction is doing address calclation or operation if (IDop == LD) EOt <= IDA + {{53{IDIR[3]}}, IDIR[3:]}; else if (IDop == SD) EOt <= IDA + {{53{IDIR[3]}}, IDIR [3:5], IDIR[:7]}; else if (IDop == op) case (IDIR[3:5]) // case for the varios R-type instrctions : EOt <= Ain + Bin; // add operation FIGURE e.3. A Verilog behavioral model for the RISC-V five-stage pipeline, ignoring branch and hazards. As in the design earlier in Chapter, we se separate instrction and memories, which wold be implemented sing separate caches as we describe in Chapter 5.

3 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e3 defalt: ; // other R-type operations: sbtract, SLT, etc. case EIR <= IDIR; EB <= IDB; // pass along the IR & B // em stage of pipeline if (Eop == op) EVale <= EOt; // pass along reslt else if (Eop == LD) EVale <= Demory[EOt >> ]; else if (Eop == SD) Demory[EOt >> ] <= EB; //store EIR <= EIR; // pass along IR // stage if (((Eop == LD) (Eop == op)) && (Erd!= )) // pdate s if load/ operation and destination not Regs[Erd] <= EVale; modle FIGURE e.3. A Verilog behavioral model for the RISC-V five-stage pipeline, ignoring branch and hazards. (Contined) Implementing Forwarding in Verilog To et the Verilog model frther, Figre e.3. shows the addition of forwarding logic for the case when the sorce and destination are instrctions. Neither load stalls nor branches are handled; we will add these shortly. The changes from the earlier Verilog description are highlighted. Someone has proposed moving the write for a reslt from an instrction from the to the E stage, pointing ot that this wold redce the maimm length of forwards from an instrction by one cycle. Which of the following is accrate reasons not to consider sch a change?. It wold not actally change the forwarding logic, so it has no advantage.. It is impossible to implement this change nder any circmstance since the write for the reslt mst stay in the same pipe stage as the write for a load reslt. 3. oving the write for instrctions wold create the possibility of writes occrring from two different instrctions dring the same clock cycle. Either an etra write port wold be reqired on the file or a strctral hazard wold be created.. The reslt of an instrction is not available in time to do the write dring E. The Behavioral Verilog with Stall Detection If we ignore branches, stalls for hazards in the RISC-V pipeline are confined to one simple case: loads whose reslts are crrently in the clock stage. Ths, eting the Verilog to handle a load with a destination that is either an instrction or an effective address calclation is reasonably straightforward, and Figre e.3.3 shows the few additions needed. Check Yorself

4 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage modle RISCVCPU (clock); // opcodes parameter LD = 7'b_, SD = 7'b_, BEQ = 7'b_, NOP = 3'h_3, op = 7'b_; inpt clock; reg [63:], Regs[:3], IDA, IDB, EB, EOt, EVale; reg [3:] Iemory[:3], Demory[:3], // separate memories IFIDIR, IDIR, EIR, EIR; // pipeline s wire [:] IFIDrs, IFIDrs, IDrs, IDrs, Erd, Erd; // Access fields wire [6:] IDop, Eop, Eop; // Access opcodes wire [63:] Ain, Bin; // the inpts // declare the bypass signals wire bypassafrome, bypassafromin, bypassbfrome, bypassbfromin, bypassafromldin, bypassbfromldin; assign IFIDrs = IFIDIR[9:5]; assign IFIDrs = IFIDIR[:]; assign IDop = IDIR[6:]; assign IDrs = IDIR[9:5]; assign IDrs = IDIR[:]; assign Eop = EIR[6:]; assign Erd = EIR[:7]; assign Eop = EIR[6:]; assign Erd = EIR[:7]; // The bypass to inpt A from the E stage for an operation assign bypassafrome = (IDrs == Erd) && (IDrs!= ) && (Eop == op); // The bypass to inpt B from the E stage for an operation assign bypassbfrome = (IDrs == Erd) && (IDrs!= ) && (Eop == op); // The bypass to inpt A from the stage for an operation assign bypassafromin = (IDrs == Erd) && (IDrs!= ) && (Eop == op); // The bypass to inpt B from the stage for an operation assign bypassbfromin = (IDrs == Erd) && (IDrs!= ) && (Eop == op); // The bypass to inpt A from the stage for an LD operation assign bypassafromldin = (IDrs == Erd) && (IDrs!= ) && (Eop == LD); // The bypass to inpt B from the stage for an LD operation assign bypassbfromldin = (IDrs == Erd) && (IDrs!= ) && (Eop == LD); // The A inpt to the is bypassed from E if there is a bypass there, // Otherwise from if there is a bypass there, and otherwise comes from the ID assign Ain = bypassafrome? EOt : (bypassafromin bypassafromldin)? EVale : IDA; // The B inpt to the is bypassed from E if there is a bypass there, // Otherwise from if there is a bypass there, and otherwise comes from the ID FIGURE e.3. A behavioral definition of the five-stage RISC-V pipeline with bypassing to operations and address calclations. The code added to Figre e.3. to handle bypassing is highlighted. Becase these bypasses only reqire changing where the inpts come from, the only changes reqired are in the combinational logic responsible for selecting the inpts. (contines on net page)

5 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e5 assign Bin = bypassbfrome? EOt : (bypassbfromin bypassbfromldin)? EVale: IDB; integer i; // sed to initialize s initial begin = ; IFIDIR = NOP; IDIR = NOP; EIR = NOP; EIR = NOP; // pt NOPs in pipeline s for (i=;i<=3;i=i+) Regs[i] = i; // initialize s--jst so they aren't cares // Remember that ALL these actions happen every pipe stage and with the se of <= they happen in parallel! clock) begin // first instrction in the pipeline is being fetched // Fetch & increment IFIDIR <= Iemory[ >> ]; <= + ; // second instrction in pipeline is fetching s IDA <= Regs[IFIDrs]; IDB <= Regs[IFIDrs]; // get two s IDIR <= IFIDIR; // pass along IR--can happen anywhere, since this affects net stage only! // third instrction is doing address calclation or operation if (IDop == LD) EOt <= IDA + {{53{IDIR[3]}}, IDIR[3:]}; else if (IDop == SD) EOt <= IDA + {{53{IDIR[3]}}, IDIR[3:5], IDIR[:7]}; else if (IDop == op) case (IDIR[3:5]) // case for the varios R-type instrctions : EOt <= Ain + Bin; // add operation defalt: ; // other R-type operations: sbtract, SLT, etc. case EIR <= IDIR; EB <= IDB; // pass along the IR & B // em stage of pipeline if (Eop == op) EVale <= EOt; // pass along reslt else if (Eop == LD) EVale <= Demory[EOt >> ]; else if (Eop == SD) Demory[EOt >> ] <= EB; //store EIR <= EIR; // pass along IR // stage if (((Eop == LD) (Eop == op)) && (Erd!= )) // pdate s if load/ operation and destination not Regs[Erd] <= EVale; modle FIGURE e.3. A behavioral definition of the five-stage RISC-V pipeline with bypassing to operations and address calclations. (Contined)

6 modle RISCVCPU (clock); // opcodes parameter LD = 7'b_, SD = 7'b_, BEQ = 7'b_, NOP = 3'h_3, op = 7'b_; inpt clock; reg [63:], Regs[:3], IDA, IDB, EB, EOt, EVale; reg [3:] Iemory[:3], Demory[:3], // separate memories IFIDIR, IDIR, EIR, EIR; // pipeline s wire [:] IFIDrs, IFIDrs, IDrs, IDrs, Erd, Erd; // Access fields wire [6:] IDop, Eop, Eop; // Access opcodes wire [63:] Ain, Bin; // the inpts // declare the bypass signals wire bypassafrome, bypassafromin, bypassbfrome, bypassbfromin, bypassafromldin, bypassbfromldin; wire stall; // stall signal assign IFIDrs = IFIDIR[9:5]; assign IFIDrs = IFIDIR[:]; assign IDop = IDIR[6:]; assign IDrs = IDIR[9:5]; assign IDrs = IDIR[:]; assign Eop = EIR[6:]; assign Erd = EIR[:7]; assign Eop = EIR[6:]; assign Erd = EIR[:7]; // The bypass to inpt A from the E stage for an operation assign bypassafrome = (IDrs == Erd) && (IDrs!= ) && (Eop == op); // The bypass to inpt B from the E stage for an operation assign bypassbfrome = (IDrs == Erd) && (IDrs!= ) && (Eop == op); // The bypass to inpt A from the stage for an operation assign bypassafromin = (IDrs == Erd) && (IDrs!= ) && (Eop == op); // The bypass to inpt B from the stage for an operation assign bypassbfromin = (IDrs == Erd) && (IDrs!= ) && (Eop == op); // The bypass to inpt A from the stage for an LD operation assign bypass AfromLDin = (IDrs == Erd) && (IDrs!= ) && (Eop == LD); // The bypass to inpt B from the stage for an LD operation assign bypassbfromldin = (IDrs == Erd) && (IDrs!= ) && (Eop == LD); // The A inpt to the is bypassed from E if there is a bypass there, // Otherwise from if there is a bypass there, and otherwise comes from the ID assign Ain = bypassafrome? EOt : (bypassafromin bypassafromldin)? EVale : IDA; // The B inpt to the is bypassed from E if there is a bypass there, // Otherwise from if there is a bypass there, and otherwise comes from the ID assign Bin = bypassbfrome? EOt : (bypassbfromal Uin bypassbfromldin)? EVale: IDB; FIGURE e.3.3 A behavioral definition of the five-stage RISC-V pipeline with stalls for loads when the destination is an instrction or effective address calclation. The changes from Figre e.3. are highlighted. (contines on net page)

7 // The signal for detecting a stall based on the se of a reslt from LW assign stall = (Eop == LD) && ( // sorce instrction is a load (((IDop == LD) (IDop == SD)) && (IDrs == Erd)) // stall for address calc ((IDop == op) && ((IDrs == Erd) (IDrs == Erd)))); // se integer i; // sed to initialize s initial begin = ; IFIDIR = NOP; IDIR = NOP; EIR = NOP; EIR = NOP; // pt NOPs in pipeline s for (i=;i<=3;i=i+) Regs[i] = i; // initialize s--jst so they aren't cares // Remember that ALL these actions happen every pipe stage and with the se of <= they happen in parallel! clock) begin if (~stall) begin // the first three pipeline stages stall if there is a load hazard // first instrction in the pipeline is being fetched // Fetch & increment IFIDIR <= Iemory[ >> ]; <= + ; // second instrction in pipeline is fetching s IDA <= Regs[IFIDrs]; IDB <= Regs[IFIDrs]; // get two s IDIR <= IFIDIR; // pass along IR--can happen anywhere, since this affects net stage only! // third instrction is doing address calclation or operation if (IDop == LD) EOt <= IDA + {{53{IDIR[3]}}, IDIR[3:]}; else if (IDop == SD) EOt <= IDA + {{53{IDIR[3]}}, IDIR[3:5], IDIR[:7]}; else if (IDop == op) case (IDIR[3:5]) // case for the varios R-type instrctions : EOt <= Ain + Bin; // add operation defalt: ; // other R-type operations: sbtract, SLT, etc. case EIR <= IDIR; EB <= IDB; // pass along the IR & B else EIR <= NOP; // Freeze first three stages of pipeline; inject a nop into the otpt // em stage of pipeline if (Eop == op) EVale <= EOt; // pass along reslt else if (Eop == LD) EVale <= Demory[EOt >> ]; else if (Eop == SD) Demory[EOt >> ] <= EB; //store EIR <= EIR; // pass along IR // stage if (((Eop == LD) (Eop == op)) && (Erd!= )) // pdate s if load/ operation and destination not Regs[Erd] <= EVale; modle FIGURE e.3.3 A behavioral definition of the five-stage RISC-V pipeline with stalls for loads when the destination is an instrction or effective address calclation. (Contined)

8 35.e8.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage Check Yorself Someone has asked abot the possibility of hazards occrring throgh, contrary to throgh a. Which of the following statements abot sch hazards is tre?. Since accesses only occr in the E stage, all operations are done in the same order as instrction eection, making sch hazards impossible in this pipeline.. Sch hazards are possible in this pipeline; we jst have not discssed them yet. 3. No pipeline can ever have a hazard involving, since it is the programmer s job to keep the order of references accrate.. emory hazards may be possible in some pipelines, bt they cannot occr in this particlar pipeline. 5. Althogh the pipeline control wold be obligated to maintain ordering among references to avoid hazards, it is impossible to design a pipeline where the references cold be ot of order. Implementing the Branch Hazard Logic in Verilog We can et or Verilog behavioral model to implement the control for branches. We add the code to model branch eqal sing a predict not taken strategy. The Verilog code is shown in Figre e.3.. It implements the branch hazard by detecting a taken branch in ID and sing that signal to sqash the instrction in IF (by setting the IR to 3, which is an effective NOP in RISC-V); in addition, the is assigned to the branch target. Note that to prevent an nepected latch, it is important that the is clearly assigned on every path throgh the always block; hence, we assign the in a single if statement. Lastly, note that althogh Figre e.3. incorporates the basic logic for branches and control hazards, spporting branches reqires additional bypassing and hazard detection, which we have not inclded. Using Verilog for Behavioral Specification with Synthesis To demonstrate the contrasting types of Verilog, we show two descriptions of a different, nonpipelined implementation style of RISC-V that ses mltiple clock cycles per instrction. (Since some instrctors make a synthesizable description of the RISC-V pipeline project for a class, we chose not to inclde it here. It wold also be long.) Figre e.3.5 gives a behavioral specification of a mlticycle implementation of the RISC-V processor. Becase of the se of behavioral operations, it wold be difficlt to synthesize a separate path and control nit with any reasonable efficiency. This version demonstrates another approach to the control by sing a ealy finite-state machine (see discssion in Section A. of Appi A ). The se of a ealy machine, which allows the otpt to dep both on inpts and the crrent state, allows s to decrease the total nmber of states.

9 modle RISCVCPU (clock); // opcodes parameter LD = 7'b_, SD = 7'b_, BEQ = 7'b_, NOP = 3'h_3, op = 7'b_; inpt clock; reg [63:], Regs[:3], IDA, IDB, EB, EOt, EVale; reg [3:] Iemory[:3], Demory[:3], // separate memories IFIDIR, IDIR, EIR, EIR; // pipeline s wire [:] IFIDrs, IFIDrs, IDrs, IDrs, Erd, Erd; // Access fields wire [6:] IFIDop, IDop, Eop, Eop; // Access opcodes wire [63:] Ain, Bin; // the inpts // declare the bypass signals wire bypassafrome, bypassafromin, bypassbfrome, bypassbfromin, bypassafromldin, bypassbfromldin; wire stall; // stall signal wire takebranch; assign IFIDop = IFIDIR[6:]; assign IFIDrs = IFIDIR[9:5]; assign IFIDrs = IFIDIR[:]; assign IDop = IDIR[6:]; assign IDrs = IDIR[9:5]; assign IDrs = IDIR[:]; assign Eop = EIR[6:]; assign Erd = EIR[:7]; assign Eop = EIR[6:]; assign Erd = EIR[:7]; // The bypass to inpt A from the E stage for an operation assign bypassafrome = (IDrs == Erd) && (IDrs!= ) && (Eop == op); // The bypass to inpt B from the E stage for an operation assign bypassbfrome = (IDrs == Erd) && (IDrs!= ) && (Eop == op); // The bypass to inpt A from the stage for an operation assign bypas safromin = (IDrs == Erd) && (IDrs!= ) && (Eop == op); // The bypass to inpt B from the stage for an operation assign bypassbfromin = (IDrs == Erd) && (IDrs!= ) && (Eop == op); // The bypass to inpt A from the stage for an LD operation assign bypassafromldin = (IDrs == Erd) && (IDrs!= ) && (Eop == LD); // The bypass to inpt B from the stage for an LD operation assign bypassbfromldin = (IDrs == Erd) && (ID rs!= ) && (Eop == LD); // The A inpt to the is bypassed from E if there is a bypass there, // Otherwise from if there is a bypass there, and otherwise comes from the ID assign Ain = bypassafrome? EOt : (bypassafromin bypassafromldin)? EVale : IDA; // The B inpt to the is bypassed from E if there is a bypass there, // Otherwise from if there is a bypass there, and otherwise comes from the ID assign Bin = bypassbfrome? EOt : (bypassbfromin bypassbfromldin)? EVale: FIGURE e.3. A behavioral definition of the five-stage RISC-V pipeline with stalls for loads when the destination is an instrction or effective address calclation. The changes from Figre e.3. are highlighted. (contines on net page)

10 IDB; // The signal for detecting a stall based on the se of a reslt from LW assign stall = (Eop == LD) && ( // sorce instrction is a load (((IDop == LD) (IDop == SD)) && (IDrs == Erd)) // stall for address calc ((IDop == op) && ((IDrs == Erd) (IDrs == Erd)))); // se // Signal for a taken branch: instrction is BEQ and s are eqal assign takebranch = (IFIDop == BEQ) && (Regs[IFIDrs] == Regs[IFIDrs]); integer i; // sed to initialize s initial begin = ; IFIDIR = NOP; IDIR = NOP; EIR = NOP; EIR = NOP; // pt NOPs in pipeline s for (i=;i<=3;i=i+) Regs[i] = i; // initialize s--jst so they aren't cares // Remember that ALL these actions happen every pipe stage and with the se of <= they happen in parallel! clock) begin if (~stall) begin // the first three pipeline stages stall if there is a load hazard if (~takebranch) begin // first instrction in the pipeline is being fetched normally IFIDIR <= Iemory[ >> ]; <= + ; else begin // a taken branch is in ID; instrction in IF is wrong; insert a NOP and reset the IFIDIR <= NOP; <= + {{5{IFIDIR[3]}}, IFIDIR[7], IFIDIR[3:5], IFIDIR[:8], 'b}; // second instrction in pipeline is fetching s IDA <= Regs[IFIDrs]; IDB <= Regs[IFIDrs]; // get two s IDIR <= IFIDIR; // pass along IR--can happen anywhere, since this affects net stage only! // third instrction is doing addre ss calclation or operation if (IDop == LD) EOt <= IDA + {{53{IDIR[3]}}, IDIR[3:]}; else if (IDop == SD) EOt <= IDA + {{53{IDIR[3]}}, IDIR[3:5], IDIR[:7]}; else if (IDop == op) case (IDIR[3:5]) // case for the varios R-type instrctions : EOt <= Ain + Bin; // add operation FIGURE e.3. A behavioral definition of the five-stage RISC-V pipeline with stalls for loads when the destination is an instrction or effective address calclation. (Contined)

11 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e defalt: ; // other R-type operations: sbtract, SLT, etc. case EIR <= IDIR; EB <= IDB; // pass along the IR & B else EIR <= NOP; // Freeze first three stages of pipeline; inject a nop into the otpt // em stage of pipeline if (Eop == op) EVale <= EOt; // pass along reslt else if (Eop == LD) EVale <= Demory[EOt >> ]; else if (Eop == SD) Demory[EOt >> ] <= EB; //store EIR <= EIR; // pass along IR // stage if (((Eop == LD) (Eop == op)) && (E rd!= )) // pdate s if load/ operation and destination not Regs[Erd] <= EVale; modle FIGURE e.3. A behavioral definition of the five-stage RISC-V pipeline with stalls for loads when the destination is an instrction or effective address calclation. (Contined) Since a version of the RISC-V design inted for synthesis is considerably more comple, we have relied on a nmber of Verilog modles that were specified in Appi A, inclding the following: The -to- mltipleor shown in Figre A.., and the -to- mltipleor that can be trivially derived based on the -to- mltipleor. The RISC-V shown in Figre A.5.5. The RISC-V control defined in Figre A.5.6. The RISC-V file defined in Figre A.8.. Now, let s look at a Verilog version of the RISC-V processor inted for synthesis. Figre e.3.6 shows the strctral version of the RISC-V path. Figre e.3.7 ses the path modle to specify the RISC-V CPU. This version also demonstrates another approach to implementing the control nit, as well as some optimizations that rely on relationships between varios control signals. Observe that the state machine specification only provides the seqencing actions. The setting of the control lines is done with a series of assign statements that dep on the state as well as the opcode field of the instrction. If one were to fold the setting of the control into the state specification, this wold look like a ealy-style finite-state control nit. Becase the setting of the control lines is specified sing assign statements otside of the always block, most logic synthesis systems will generate a small implementation of a finite-state machine

12 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage modle RISCVCPU (clock); parameter LD = 7'b_, SD = 7'b_, BEQ = 7'b_, op = 7'b_; inpt clock; //the clock is an eternal inpt // The architectrally visible s and scratch s for implementation reg [63:], Regs[:3], Ot, DR, A, B; reg [3:] emory [:3], IR; reg [:] state; // processor state wire [6:] opcode; // se to get opcode easily wire [63:] ImmGen; // sed to generate immediate assign opcode = IR[6:]; // opcode is lower 7 bits assign ImmGen = (opcode == LD)? {{53{IR[3]}}, IR[3:]} : /* (opcode == SD) */{{53{IR[3]}}, IR[3:5], IR[:7]}; assign Offset = {{5{IR[3]}}, IR[7], IR[3:5], IR[:8], 'b}; // set the to and start the control in state initial begin = ; state = ; // The state machine--triggered on a rising clock clock) begin Regs[] <= ; // shortct way to make sre R is always case (state) //action deps on the state : begin // first step: fetch the instrction, increment, go to net state IR <= emory[ >> ]; <= + ; state <= ; // net state : begin // second step: decode, fetch, also compte branch address A <= Regs[IR[9:5]]; B <= Regs[IR[:]]; Ot <= + Offset; // compte -relative branch target state <= 3; 3: begin // third step: Load-store eection, eection, Branch completion if ((opcode == LD) (opcode == SD)) begin Ot <= A + ImmGen; // compte effective address state <= ; else if (opcode == op) begin case (IR[3:5]) // case for the varios R-type instrctions : Ot <= A + B; // add operation defalt: ; // other R-type operations: sbtract, SLT, etc. case state <= ; else if (opcode == BEQ) begin if (A == B) begin <= Ot; // branch taken--pdate FIGURE e.3.5 A behavioral specification of the mlticycle RISC-V design. This has the same cycle behavior as the mlticycle design, bt is prely for simlation and specification. It cannot be sed for synthesis. (contines on net page)

13 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e3 state <= ; else ; // other opcodes or eception for ndefined instrction wold go here : begin if (opcode == op) begin // Operation Regs[IR[:7]] <= Ot; // write the reslt state <= ; // R-type finishes else if (opcode == LD) begin // load instrction DR <= emory[ot >> ]; // read the state <= 5; // net state else if (opcode == SD) begin // store instrction emory[ot >> ] <= B; // write the state <= ; // retrn to state else ; // other instrctions go here 5: begin // LD is the only instrction still in eection Regs[IR[:7]] <= DR; // write the DR to the state <= ; // complete an LD instrction case modle FIGURE e.3.5 A behavioral specification of the mlticycle RISC-V design. (Contined) that determines the setting of the state and then ses eternal logic to derive the control inpts to the path. In writing this version of the control, we have also taken advantage of a nmber of insights abot the relationship between varios control signals as well as sitations where we don t care abot the control signal vale; some eamples of these are given in the following elaboration. ore Illstrations of Eection on the Hardware To redce the cost of this book, starting with the third edition, we moved sections and figres that were sed by a minority of instrctors online. This sbsection recaptres those figres for readers who wold like more spplemental material to nderstand pipelining better. These are all single-clock-cycle pipeline diagrams, which take many figres to illstrate the eection of a seqence of instrctions. The three eamples are respectively for code with no hazards, an eample of forwarding on the pipelined implementation, and an eample of bypassing on the pipelined implementation.

14 modle path (Op, emtoreg, em, em, IorD, Reg, IR,, Cond, SrcA, SrcB, Sorce, opcode, clock); // the control inpts + clock parameter LD = 7'b_, SD = 7'b_; inpt [:] Op, SrcB; // -bit control signals inpt emtoreg, em, em, IorD, Reg, IR,, Cond, SrcA, Sorce, clock; // -bit control signals otpt [6:] opcode; // opcode is needed as an otpt by control reg [63:], DR, Ot; // CPU state + some temporaries reg [3:] emory[:3], IR; // CPU state + some temporaries wire [63:] A, B, SignEtOffset, Offset, ResltOt, Vale, JmpAddr,, Ain, Bin, emot; // these are signals derived from s wire [3:] Ctl; // the control lines wire Zero; // the Zero ot signal from the initial = ; //start the at //Combinational signals sed in the path // sing word address with either Ot or as the address sorce assign emot = em? emory[(iord? Ot : ) >> ] : ; assign opcode = IR[6:]; // opcode shortct // Get the write either from the Ot or from the DR assign = emtoreg? DR : Ot; // Generate immediate assign ImmGen = (opcode == LD)? {{53{IR[3]}}, IR[3:]} : /* (opcode == SD) */{{53{IR[3]}}, IR[3:5], IR[:7]}; // Generate pc offset for branches assign Offset = {{5{IR[3]}}, IR[7], IR[3:5], IR[:8], 'b}; // The A inpt to the is either the rs or the assign Ain = SrcA? A : ; // inpt is or A // Creates an instance of the control nit (see the modle defined in Figre B.5.6 // Inpt Op is control-nit set and sed to describe the instrction class as in Chapter // Inpt IR[3:5] is the fnction code field for an instrction // Otpt Ctl are the actal control bits as in Chapter alcontroller (Op, IR[3:5], Ctl); // control nit // Creates a -to- mltipleor sed to select the sorce of the net // Inpts are ResltOt (the incremented ), Ot (the branch address) // Sorce is the selector inpt and Vale is the mltipleor otpt ltto src (ResltOt, Ot, Sorce, Vale); // Creates a -to- mltipleor sed to select the B inpt of the // Inpts are B, constant, generated immediate, offset // SrcB is the select or inpt // Bin is the mltipleor otpt ltto Binpt (B, 6'd, ImmGen, Offset, SrcB, Bin); // Creates a RISC-V // Inpts are Ctl (the control), vale inpts (Ain, Bin) // Otpts are ResltOt (the 6-bit otpt) and Zero (zero detection otpt) RISCV (Ctl, Ain, Bin, ResltOt, Zero); // the FIGURE e.3.6 A Verilog version of the mlticycle RISC-V path that is appropriate for synthesis. This path relies on several nits from Appi A. Initial statements do not synthesize, and a version sed for synthesis wold have to incorporate a reset signal that had this effect. Also note that resetting R to on every clock is not the best way to ensre that R stays at ; instead, modifying the file modle to prodce whenever R is read and to ignore writes to R wold be a more efficient soltion. (contines on net page)

15 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e5 // Creates a RISC-V file // Inpts are the rs and rs fields of the IR sed to specify which s to read, // reg (the write nmber), (the to be written), // Reg (indicates a write), the clock // Otpts are A and B, the s read file regs (IR[9:5], IR[:], IR[:7],, Reg, A, B, clock); // Register file // The clock-triggered actions of the path clock) begin if (em) emory[ot >> ] <= B; // --mst be a store Ot <= ResltOt; // Save the reslt for se on a later clock cycle if (IR) IR <= emot; // the IR if an instrction fetch DR <= emot; // Always save the read vale // The is written both conditionally (controlled by ) and nconditionally modle FIGURE e.3.6 A Verilog version of the mlticycle RISC-V path that is appropriate for synthesis. (Contined) No Hazard Illstrations On page 85, we gave the eample code seqence ld sb add ld add, (),, 3, 3, 3, 8(), 5, 6 Figres e. and e.3 showed the mltiple-clock-cycle pipeline diagrams for this two-instrction seqence eecting across si clock cycles. Figres e.3.8 throgh e.3. show the corresponding single-clock-cycle pipeline diagrams for these two instrctions. Note that the order of the instrctions differs between these two types of diagrams: the newest instrction is at the bottom and to the right of the mltiple-clockcycle pipeline diagram, and it is on the left in the single-clock-cycle pipeline diagram. ore Eamples To nderstand how pipeline control works, let s consider these five instrctions going throgh the pipeline: ld sb and or add, (),, 3,, 5 3, 6, 7, 8, 9

16 35.e6.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage modle RISCVCPU (clock); parameter LD = 7'b_, SD = 7'b_, BEQ = 7'b_, op = 7'b_; inpt clock; reg [:] state; wire [:] Op, SrcB; wire [6:] opcode; wire emtoreg, em, em, IorD, Reg, IR,, Cond, SrcA, Sorce, emoryop; // Create an instance of the RISC-V path, the inpts are the control signals; opcode is only otpt path RISCVDP (Op, emtoreg, em, em, IorD, Reg, IR,, Cond, SrcA, SrcB, Sorce, opcode, clock); initial begin state = ; // start the state machine in state // These are the definitions of the control signals assign emoryop = (opcode == LD) (opcode == SD); // a operation assign Op = ((state == ) (state == ) ((state == 3) && emoryop))? 'b : // add ((state == 3) && (opcode == BEQ))? 'b : 'b; // sbtract or se fnction code assign emtoreg = ((state == ) && (opcode == op))? : ; assign em = (state == ) ((state == ) && (opcode == LD)); assign em = (state == ) && (opcode == SD); assign IorD = (state == )? : ; assign Reg = (state == 5) ((state == ) && (opcode == op)); assign IR = (state == ); assign = (state == ); assign Cond = (state == 3) && (opcode == BEQ); assign SrcA = ((state == ) (state == ))? : ; assign SrcB = ((state == ) ((state == 3) && (opcode == BEQ)))? 'b : (state == )? 'b : ((state == 3) && emoryop)? 'b : 'b; // operation or other assign Sorce = (state == )? : ; // Here is the state machine, which only has to seqence states clock) begin // all state pdates on a positive clock edge case (state) : state <= ; // nconditional net state : state <= 3; // nconditional net state 3: state <= (opcode == BEQ)? : ; // branch go back else net state : state <= (opcode == LD)? 5 : ; // R-type and SD finish 5: state <= ; // go back case modle FIGURE e.3.7 The RISC-V CPU sing the path from Figre e.3.6.

17 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e7 ld, () fetch /E E/ Add Add Sm 3 Imm Gen 6 Shift left Zero reslt Clock sb,, 3 ld, () fetch decode /E E/ Add 3 Imm Gen 6 Shift left Add Sm Zero reslt Clock FIGURE e.3.8 Single-cycle pipeline diagrams for clock cycles (top diagram) and (bottom diagram). This style of pipeline representation is a snapshot of every instrction eecting dring one clock cycle. Or eample has bt two instrctions, so at most two stages are identified in each clock cycle; normally, all five stages are occpied. The highlighted portions of the path are active in that clock cycle. The load is fetched in clock cycle and decoded in clock cycle, with the sbtract fetched in the second clock cycle. To make the figres easier to nderstand, the other pipeline stages are empty, bt normally there is an instrction in every pipeline stage.

18 35.e8.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage sb,, 3 decode ld, () Eection /E E/ Add Add Sm 3 Imm Gen 6 Shift left Zero reslt Clock 3 sb,, 3 Eection ld, () emory /E E/ Add Shift left Add Sm Zero reslt 3 Imm Gen 6 Clock FIGURE e.3.9 Single-cycle pipeline diagrams for clock cycles 3 (top diagram) and (bottom diagram). In the third clock cycle in the top diagram, ld enters the stage. At the same time, sb enters ID. In the forth clock cycle (bottom path), ld moves into E stage, reading sing the address fond in /E at the beginning of clock cycle. At the same time, the sbtracts and then places the difference into /E at the of the clock cycle.

19 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e9 sb,, 3 emory ld, () back /E E/ Add Add Sm 3 Imm Gen 6 Shift left Zero reslt Clock 5 sb,, 3 back /E E/ Add 3 Imm Gen 6 Shift left Add Sm Zero reslt Clock 6 FIGURE e.3. Single-cycle pipeline diagrams for clock cycles 5 (top diagram) and 6 (bottom diagram). In clock cycle 5, ld completes by writing the in E/ into, and sb ss the difference in /E to E/. In the net clock cycle, sb writes the vale in E/ to.

20 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage Figres e.3. throgh e.3.5 show these instrctions proceeding throgh the nine clock cycles it takes them to complete eection, highlighting what is active in a stage and identifying the instrction associated with each stage dring a clock cycle. If yo eamine them careflly, yo may notice: In Figre e.3.3 yo can see the seqence of the destination nmbers from left to right at the bottom of the pipeline s. The nmbers advance to the right dring each clock cycle, with the E/ pipeline spplying the nmber of the written dring the stage. When a stage is inactive, the vales of control lines that are deasserted are shown as or X (for don t care). Seqencing of control is embedded in the pipeline strctre itself. First, all instrctions take the same nmber of clock cycles, so there is no special control for instrction dration. Second, all control information is compted dring instrction decode and then passed along by the pipeline s. Forwarding Illstrations We can se the single-clock-cycle pipeline diagrams to show how forwarding operates, as well as how the control activates the forwarding paths. Consider the following code seqence in which the depences have been highlighted: sb,, 3 and,, 5 or,, add 9,, Figres e.3.6 and e.3.7 show the events in clock cycles 3 6 in the eection of these instrctions. Ths, in clock cycle 5, the forwarding nit selects the /E pipeline for the pper inpt to the and the E/ pipeline for the lower inpt to the. The following add instrction reads both, the target of the and instrction, and, which the sb instrction has already written. Notice that the prior two instrctions both write, so the forwarding nit mst pick the immediately preceding one (E stage). In clock cycle 6, the forwarding nit ths selects the /E pipeline, containing the reslt of the or instrction, for the pper inpt bt ses the non-forwarding vale for the lower inpt to the. Illstrating Pipelines with Stalls and Forwarding We can se the single-clock-cycle pipeline diagrams to show how the control for stalls works. Figres e.3.8 throgh e.3. show the single-cycle diagram for clocks throgh 7 for the following code seqence (depences highlighted): ld and or add, (),, 5,, 9,,

21 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e IF: ld, () ID: before<> : before<> E: before<3> : before<> /E E/ Add Clock Reg [3, ] [ 7] Shift left [3 ] Imm Gen control Add Sm Src Zero reslt Op Branch em em emtoreg IF: sb,, 3 ID: ld, () : before<> E: before<> : before<3> /E E/ ld Add Clock X Reg [3 ] Sign- et [3, ] [ 7] X 3 Shift left Add Sm control Src Zero reslt Op Branch em em emtoreg FIGURE e.3. Clock cycles and. The phrase before <i> means the i th instrction before ld. The ld instrction in the top path is in the IF stage. At the of the clock cycle, the ld instrction is in the pipeline s. In the second clock cycle, seen in the bottom path, the ld moves to the ID stage, and sb enters in the IF stage. Note that the vales of the instrction fields and the selected sorce s are shown in the ID stage. Hence, and the constant, the operands of ld, are written into the pipeline. The nmber, representing the destination nmber of ld, is also placed in. The top of the pipeline shows the control vales for ld to be sed in the remaining stages. These control vales can be read from the ld row of the table in Figre e.8.

22 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage IF: and,, 5 ID: sb,, 3 : ld,... E: before<> : before<> sb /E E/ Add Clock 3 3 Reg [3 ] [3, ] [ 7] 3 Imm Gen X 8 Shift left 3 Add Sm Src Zero reslt control Op Branch em em emtoreg IF: or 3, 6, 7 ID: and,, 5 : sb,... E: ld,... : before<> /E E/ and Add Clock 5 Reg [3 ] [3, ] [ 7] Imm Gen X 7 5 Shift left 3 8 Add Sm Src control Zero reslt Op Branch em em emtoreg FIGURE e.3. Clock cycles 3 and. In the top diagram, ld enters the stage in the third clock cycle, adding and to form the address in the /E pipeline. (The ld instrction is written ld, pon reaching, becase the identity of instrction operands is not needed by or the sbseqent stages. In this version of the pipeline, the actions of, E, and dep only on the instrction and its destination or its target address.) At the same time, sb enters ID, reading s and 3, and the and instrction starts IF. In the forth clock cycle (bottom path), ld moves into E stage, reading sing the vale in / E as the address. In the same clock cycle, the sbtracts 3 from and places the difference into /E, reads s and 5 dring ID, and the or instrction enters IF. The two diagrams show the control signals being created in the ID stage and peeled off as they are sed in sbseqent pipe stages.

23 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e3 IF: add, 8, 9 ID: or 3, 6, 7 : and,... E: sb,... : ld,... /E E/ or Add Clock Reg [3 ] [3, ] [ 7] 6 7 Imm Gen X 6 Shift left Add Sm Src Zero reslt control Op Branch em em emtoreg IF: after<> ID: add, 8, 9 : or 3,... E: and,... : sb,... /E E/ add Add Clock Reg [3 ] [3, ] [ 7] Imm Gen X 8 9 Shift left Add Sm Src control Zero reslt Op Branch em 3 em emtoreg FIGURE e.3.3 Clock cycles 5 and 6. Wit h add, the final instrction in this eample, entering IF in the top path, all instrctions are engaged. By writing the in E/ into, ld completes; both the and the nmber are in E/. In the same clock cycle, sb ss the difference in /E to E/, and the rest of the instrctions move forward. In the net clock cycle, sb selects the vale in E/ to write to nmber, again fond in E/. The remaining instrctions play follow-theleader: the calclates the OR of 6 and 7 for the or instrction in the stage, and s 8 and 9 are read in the ID stage for the add instrction. The instrctions after add are shown as inactive jst to emphasize what occrs for the five instrctions in the eample. The phrase after <i> means the i th instrction after add.

24 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage IF: after<> ID: after<> : add,... E: or 3,... : and,... /E E/ Add Clock 7 Reg [3 ] control Imm Gen [3, ] [ 7] Shift left 8 9 Add Sm Src Zero reslt Op Branch em em 3 emtoreg IF: after<3> ID: after<> : after<> E: add,... : or 3,... /E E/ Add Clock 8 3 Reg [3 ] [3, ] Imm Gen Shift left Add Sm Src control Zero reslt Branch em Op [ 7] 3 em emtoreg FIGURE e.3. Clock cycles 7 and 8. In the top path, the add instrction brings p the rear, adding the vales corresponding to s 8 and 9 dring the stage. The reslt of the or instrction is passed from /E to E/ in the E stage, and the stage writes the reslt of the and instrction in E/ to. Note that the control signals are deasserted (set to ) in the ID stage, since no instrction is being eected. In the following clock cycle (lower drawing), the stage writes the reslt to 3, thereby completing or, and the E stage passes the sm from the add in /E to E/. The instrctions after add are shown as inactive for pedagogical reasons.

25 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e5 IF: after<> ID: after<3> : after<> E: after<> : add,... /E E/ Add Clock 9 Reg [3 ] [3, ] Imm Gen Shift left Add Sm control Src Zero reslt Branch em Op [ 7] em emtoreg FIGURE e.3.5 Clock cycle 9. The stage writes the reslt in E/ into, completing add and the fiveinstrction seqence. The instrctions after add are shown as inactive for pedagogical reasons.

26 35.e6.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage or,, and,, 5 sb,, 3 before<> before<> /E E/ Forwarding nit Clock 3 add 9,, or,, and,, 5 sb X,... before<> /E E/ 5 5 Forwarding nit Clock FIGURE e.3.6 Clock cycles 3 and of the instrction seqence on page 366.e6. The bold lines are those active in a clock cycle, and the italicized nmbers in color indicate a hazard. The forwarding nit is highlighted by shading it when it is forwarding to the. The instrctions before sb are shown as inactive jst to emphasize what occrs for the for instrctions in the eample. Operand names are sed in for control of forwarding; ths they are inclded in the instrction label for. Operand names are not needed in E or, so is sed. Compare this with Figres e.3. throgh e.3.5, which show the path withot forwarding where ID is the last stage to need operand information.

27 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e7 after<> add 9,, or,, and,... sb,.. /E E/ 9 Forwarding nit Clock 5 after<> after<> add 9,, or,... and,.. /E E/ 9 Forwarding nit Clock 6 FIGURE e.3.7 Clock cycles 5 and 6 of the instrction seqence on page 366.e6. The forwarding nit is highlighted when it is forwarding to the. The two instrctions after add are shown as inactive jst to emphasize what occrs for the for instrctions in the eample. The bold lines are those active in a clock cycle, and the italicized nmbers in color indicate a hazard.

28 35.e8.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage and,, 5 ld, () before<> before<> before<3> X Hazard detection nit.em /E E/ X XX X.RegisterRd Forwarding nit Clock or,, and,, 5 5 Hazard detection nit.em ld, () before<> before<> /E E/ 5 5 XX 5 X.RegisterRd Forwarding nit Clock 3 FIGURE e.3.8 Clock cycles and 3 of the instrction seqence on page 366.e6 with a load replacing sb. The bold lines are those active in a clock cycle, the italicized nmbers in color indicate a hazard, and the in the place of operands means that their identity is information not needed by that stage. The vales of the significant control lines, s, and nmbers are labeled in the figres. The and instrction wants to read the vale created by the ld instrction in clock cycle 3, so the hazard detection nit stalls the and and or instrctions. Hence, the hazard detection nit is highlighted.

29 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e9 or,, and,, 5 Bbble ld,... before<> 5 Hazard detection nit.em /E E/ RegisterRd Forwarding nit Clock add 9,, or,, Hazard detection nit.em and,, 5 Bbble ld,... /E E/ 5 5.RegisterRd Forwarding nit Clock 5 FIGURE e.3.9 Clock cycles and 5 of the instrction seqence on page 366.e6 with a load replacing sb. The bbble is inserted in the pipeline in clock cycle, and then the and instrction is allowed to proceed in clock cycle 5. The forwarding nit is highlighted in clock cycle 5 becase it is forwarding from ld to the. Note that in clock cycle, the forwarding nit forwards the address of the ld as if it were the contents of ; this is rered harmless by the insertion of the bbble. The bold lines are those active in a clock cycle, and the italicized nmbers in color indicate a hazard.

30 35.e3.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage after<> add 9,, or,, and,... Bbble Hazard detection nit.em /E E/ 9.RegisterRd Forwarding nit Clock 6 after<> after<> add 9,, or,... and,... Hazard detection nit.em /E E/ 9.RegisterRd Forwarding nit Clock 7 FIGURE e.3. Clock cycles 6 and 7 of the instrction seqence on page 366.e6 with a load replacing sb. Note that nlike in Figre e.3.7, the stall allows the ld to complete, and so there is no forwarding from E/ in clock cycle 6. Register for the add in the stage still deps on the reslt from or in /E, so the forwarding nit passes the reslt to the. The bold lines show inpt lines active in a clock cycle, and the italicized nmbers indicate a hazard. The instrctions after add are shown as inactive for pedagogical reasons.

4.13. An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations

4.13. An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations .3 An Introdction to Digital Design Using a Hardware Design Langage to Describe and odel a Pipeline and ore Pipelining Illstrations This online section covers hardware description langages and then gives