4.13. An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations

Size: px

Start display at page:

Download "4.13. An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations"

Dale Murphy
5 years ago
Views:

1 .3 An Introdction to Digital Design Using a Hardware Design Langage to Describe and odel a Pipeline and ore Pipelining Illstrations This online section covers hardware description langages and then gives a dozen eamples of pipeline diagrams, starting on page.3-8. As mentioned in Appi B, Verilog can describe processors for simlation or with the intention that the Verilog specification be synthesized. To achieve acceptable synthesis s in size and speed, and a behavioral specification inted for synthesis mst careflly delineate the highly combinational portions of the design, sch as a path, from the control. The path can then be synthesized sing available libraries. A Verilog specification inted for synthesis is sally longer and more comple. We start with a behavioral model of the five-stage pipeline. To illstrate the dichotomy between behavioral and synthesizable designs, we then give two Verilog descriptions of a mltiple-cycle-per-instrction IPS processor: one inted solely for simlations and one sitable for synthesis. This IPS architectre is satisfactory to demonstrate Verilog, bt if any reader bilds one for LEGv8 and wants to share it with others, please contact the pblisher. Using Verilog for Behavioral Specification with Simlation for the Five-Stage Pipeline Figre.3. shows a Verilog behavioral description of the pipeline that handles instrctions as well as loads and stores. It does not accommodate branches (even incorrectly!), which we postpone inclding ntil later in the chapter. Becase Verilog lacks the ability to define registers with named fields sch as strctres in C, we se several indepent registers for each pipeline register. We name these registers with a prefi sing the same convention; hence, IFIDIR is the IR portion of the IFID pipeline register. This version is a behavioral description not inted for synthesis. s take the same nmber of clock cycles as or hardware design, bt the control is done in a simpler fashion by repeatedly decoding fields of the instrction in each pipe stage. Becase of this difference, the instrction register (IR) is needed throghot the pipeline, and the entire IR is passed from pipe stage to pipe stage. As yo read the Verilog descriptions in this chapter, remember that the actions in the always block all occr in parallel on every clock cycle. Since there are no blocking assignments, the order of the events within the always block is arbitrary.

2 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-3 modle CPU (clock); // opcodes parameter LW = 6 b, SW = 6 b, BEQ = 6 b, no-op = 3 b_, op = 6 b; inpt clock; reg[3:], Regs[:3], Iemory[:3], Demory[:3], // separate memories IFIDIR, IDA, IDB, IDIR, EIR, EB, // pipeline registers EOt, EVale, EIR; // pipeline registers wire [:] IDrs, IDrt, Erd, Erd, Ert; // Access register fields wire [5:] Eop, Eop, IDop; // Access opcodes wire [3:] Ain, Bin; // the inpts // These assignments define fields from the pipeline registers assign IDrs = IDIR[5:]; // rs field assign IDrt = IDIR[:6]; // rt field assign Erd = EIR[5:]; // rd field assign Erd = EIR[5:]; //rd field assign Ert = EIR[:6]; //rt field--sed for loads assign Eop = EIR[3:6]; // the opcode assign Eop = EIR[3:6]; // the opcode assign IDop = IDIR[3:6]; // the opcode // Inpts to the come directly from the pipeline registers assign Ain = IDA; assign Bin = IDB; reg [5:] i; //sed to initialize registers initial begin = ; IFIDIR = no-op; IDIR = no-op; EIR = no-op; EIR = no-op; // pt no-ops in pipeline registers for (i=;i<=3;i=i+) Regs[i] = i; //initialize registers--jst so they aren t cares (posedge clock) begin // Remember that ALL these actions happen every pipe stage and with the se of <= they happen in parallel! // first instrction in the pipeline is being fetched IFIDIR <= Iemory[>>]; <= + ; // Fetch & increment // second instrction in pipeline is fetching registers IDA <= Regs[IFIDIR[5:]]; IDB <= Regs[IFIDIR[:6]]; // get two registers IDIR <= IFIDIR; //pass along IR--can happen anywhere, since this affects net stage only! // third instrction is doing address calclation or operation if ((IDop==LW) (IDop==SW)) // address calclation EOt <= IDA +{{6{IDIR[5]}}, IDIR[5:]}; else if (IDop==op) case (IDIR[5:]) //case for the varios R-type instrctions 3: EOt <= Ain + Bin; //add operation defalt: ; //other R-type operations: sbtract, SLT, etc. case FIGURE.3. A Verilog behavioral model for the IPS five-stage pipeline, ignoring branch and hazards. As in the design earlier in Chapter, we se separate instrction and memories, which wold be implemented sing separate caches as we describe in Chapter 5.

3 .3-.3 An Introdction to Digital Design Using a Hardware Design Langage EIR <= IDIR; EB <= IDB; //pass along the IR & B register //em stage of pipeline if (Eop==op) EVale <= EOt; //pass along else if (Eop == LW) EVale <= Demory[EOt>>]; else if (Eop == SW) Demory[EOt>>] <=EB; //store EIR <= EIR; //pass along IR // the stage if ((Eop==op) & (Erd!= )) // pdate registers if operation and destination not Regs[Erd] <= EVale; // operation else if ((Eop == LW)& (Ert!= )) // Update registers if load and destination not Regs[Ert] <= EVale; modle FIGURE.3. A Verilog behavioral model for the IPS five-stage pipeline, ignoring branch and hazards. (Contined). Implementing Forwarding in Verilog To et the Verilog model frther, Figre.3. shows the addition of forwarding logic for the case when the sorce and destination are instrctions. Neither load stalls nor branches are handled; we will add these shortly. The changes from the earlier Verilog description are highlighted. Check Yorself Someone has proposed moving the write for a from an instrction from the to the E stage, pointing ot that this wold redce the maimm length of forwards from an instrction by one cycle. Which of the following is accrate reasons not to consider sch a change?. It wold not actally change the forwarding logic, so it has no advantage.. It is impossible to implement this change nder any circmstance since the write for the mst stay in the same pipe stage as the write for a load. 3. oving the write for instrctions wold create the possibility of writes occrring from two different instrctions dring the same clock cycle. Either an etra write port wold be reqired on the register file or a strctral hazard wold be created.. The of an instrction is not available in time to do the write dring E.

4 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-5 modle CPU (clock); parameter LW = 6 b, SW = 6 b, BEQ = 6 b, no-op = 3 b_, op = 6 b; inpt clock; reg[3:], Regs[:3], Iemory[:3], Demory[:3], // separate memories IFIDIR, IDA, IDB, IDIR, EIR, EB, // pipeline registers EOt, EVale, EIR; // pipeline registers wire [:] IDrs, IDrt, Erd, Erd, Ert; //hold register fields wire [5:] Eop, Eop, IDop; Hold opcodes wire [3:] Ain, Bin; // declare the bypass signals wire bypassafrome, bypassafromin,bypassbfrome, bypassbfromin, bypassafromlwin, bypassbfromlwin; assign IDrs = IDIR[5:]; assign IDrt = IDIR[5:]; assign Erd = EIR[5:]; assign Erd = EIR[:6]; assign Eop = EIR[3:6]; assign Ert = EIR[5:]; assign Eop = EIR[3:6]; assign IDop = IDIR[3:6]; // The bypass to inpt A from the E stage for an operation assign bypassafrome = (IDrs == Erd) & (IDrs!=) & (Eop==op); // yes, bypass // The bypass to inpt B from the E stage for an operation assign bypassbfrome = (IDrt == Erd)&(IDrt!=) & (Eop==op); // yes, bypass // The bypass to inpt A from the stage for an operation assign bypassafromin =( IDrs == Erd) & (IDrs!=) & (Eop==op); // The bypass to inpt B from the stage for an operation assign bypassbfromin = (IDrt == Erd) & (IDrt!=) & (Eop==op); / // The bypass to inpt A from the stage for an LW operation assign bypassafromlwin =( IDrs == EIR[:6]) & (IDrs!=) & (Eop==LW); // The bypass to inpt B from the stage for an LW operation assign bypassbfromlwin = (IDrt == EIR[:6]) & (IDrt!=) & (Eop==LW); // The A inpt to the is bypassed from E if there is a bypass there, // Otherwise from if there is a bypass there, and otherwise comes from the ID register assign Ain = bypassafrome? EOt : (bypassafromin bypassafromlwin)? EVale : IDA; // The B inpt to the is bypassed from E if there is a bypass there, // Otherwise from if there is a bypass there, and otherwise comes from the ID register assign Bin = bypassbfrome? EOt : (bypassbfromin bypassbfromlwin)? EVale: IDB; reg [5:] i; //sed to initialize registers initial begin = ; IFIDIR = no-op; IDIR = no-op; EIR = no-op; EIR = no-op; // pt no-ops in pipeline registers for (i = ;i<=3;i = i+) Regs[i] = i; //initialize registers--jst so they aren t cares (posedge clock) begin // first instrction in the pipeline is being fetched IFIDIR <= Iemory[>>]; <= + ; // Fetch & increment FIGURE.3. A behavioral definition of the five-stage IPS pipeline with bypassing to operations and address calclations. The code added to Figre.3. to handle bypassing is highlighted. Becase these bypasses only reqire changing where the inpts come from, the only changes reqired are in the combinational logic responsible for selecting the inpts. (contines on net page)

5 An Introdction to Digital Design Using a Hardware Design Langage // second instrction is in register fetch IDA <= Regs[IFIDIR[5:]]; IDB <= Regs[IFIDIR[:6]]; // get two registers IDIR <= IFIDIR; //pass along IR--can happen anywhere, since this affects net stage only! // third instrction is doing address calclation or operation if ((IDop==LW) (IDop==SW)) // address calclation & copy B EOt <= IDA +{{6{IDIR[5]}}, IDIR[5:]}; else if (IDop==op) case (IDIR[5:]) //case for the varios R-type instrctions 3: EOt <= Ain + Bin; //add operation defalt: ; //other R-type operations: sbtract, SLT, etc. case EIR <= IDIR; EB <= IDB; //pass along the IR & B register //em stage of pipeline if (Eop==op) EVale <= EOt; //pass along else if (Eop == LW) EVale <= Demory[EOt>>]; else if (Eop == SW) Demory[EOt>>] <=EB; //store EIR <= EIR; //pass along IR // the stage if ((Eop==op) & (Erd!= )) Regs[Erd] <= EVale; // operation else if ((Eop == LW)& (Ert!= )) Regs[Ert] <= EVale; modle FIGURE.3. A behavioral definition of the five-stage IPS pipeline with bypassing to operations and address calclations. (Contined) The Behavioral Verilog with Stall Detection If we ignore branches, stalls for hazards in the IPS pipeline are confined to one simple case: loads whose s are crrently in the clock stage. Ths, eting the Verilog to handle a load with a destination that is either an instrction or an effective address calclation is reasonably straightforward, and Figre.3.3 shows the few additions needed. Check Yorself Someone has asked abot the possibility of hazards occrring throgh, contrary to throgh a register. Which of the following statements abot sch hazards is tre?. Since accesses only occr in the E stage, all operations are done in the same order as instrction eection, making sch hazards impossible in this pipeline.. Sch hazards are possible in this pipeline; we jst have not discssed them yet. 3. No pipeline can ever have a hazard involving, since it is the programmer s job to keep the order of references accrate.. emory hazards may be possible in some pipelines, bt they cannot occr in this particlar pipeline. 5. Althogh the pipeline control wold be obligated to maintain ordering among references to avoid hazards, it is impossible to design a pipeline where the references cold be ot of order.

6 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-7 modle CPU (clock); parameter LW = 6 b, SW = 6 b, BEQ = 6 b, no-op = 3 b_, op = 6 b; inpt clock; reg[3:], Regs[:3], Iemory[:3], Demory[:3], // separate memories IFIDIR, IDA, IDB, IDIR, EIR, EB, // pipeline registers EOt, EVale, EIR; // pipeline registers wire [:] IDrs, IDrt, Erd, Erd, Ert; //hold register fields wire [5:] Eop, Eop, IDop; Hold opcodes wire [3:] Ain, Bin; // declare the bypass signals wire stall, bypassafrome, bypassafromin,bypassbfrome, bypassbfromin, bypassafromlwin, bypassbfromlwin; assign IDrs = IDIR[5:]; assign IDrt = IDIR[5:]; assign Erd = EIR[5:]; assign Erd = EIR[:6]; assign Eop = EIR[3:6]; assign Ert = EIR[5:]; assign Eop = EIR[3:6]; assign IDop = IDIR[3:6]; // The bypass to inpt A from the E stage for an operation assign bypassafrome = (IDrs == Erd) & (IDrs!=) & (Eop==op); // yes, bypass // The bypass to inpt B from the E stage for an operation assign bypassbfrome = (IDrt== Erd)&(IDrt!=) & (Eop==op); // yes, bypass // The bypass to inpt A from the stage for an operation assign bypassafromin =( IDrs == Erd) & (IDrs!=) & (Eop==op); // The bypass to inpt B from the stage for an operation assign bypassbfromin = (IDrt==Erd) & (IDrt!=) & (Eop==op); / // The bypass to inpt A from the stage for an LW operation assign bypassafromlwin =( IDrs ==EIR[:6]) & (IDrs!=) & (Eop==LW); // The bypass to inpt B from the stage for an LW operation assign bypassbfromlwin = (IDrt==EIR[:6]) & (IDrt!=) & (Eop==LW); // The A inpt to the is bypassed from E if there is a bypass there, // Otherwise from if there is a bypass there, and otherwise comes from the ID register assign Ain = bypassafrome? EOt : (bypassafromin bypassafromlwin)? EVale : IDA; // The B inpt to the is bypassed from E if there is a bypass there, // Otherwise from if there is a bypass there, and otherwise comes from the ID register assign Bin = bypassbfrome? EOt : (bypassbfromin bypassbfromlwin)? EVale: IDB; // The signal for detecting a stall based on the se of a from LW assign stall = (EIR[3:6]==LW) && // sorce instrction is a load ((((IDop==LW) (IDop==SW)) && (IDrs==Erd)) // stall for address calc ((IDop==op) && ((IDrs==Erd) (IDrt==Erd)))); // se reg [5:] i; //sed to initialize registers initial begin = ; IFIDIR = no-op; IDIR = no-op; EIR = no-op; EIR = no-op; // pt no-ops in pipeline registers for (i = ;i<=3;i = i+) Regs[i] = i; //initialize registers--jst so they aren t cares (posedge clock) begin if (~stall) begin // the first three pipeline stages stall if there is a load hazard FIGURE.3.3 A behavioral definition of the five-stage IPS pipeline with stalls for loads when the destination is an instrction or effective address calclation. The changes from Figre.3. are highlighted. (contines on net page)

7 An Introdction to Digital Design Using a Hardware Design Langage // first instrction in the pipeline is being fetched IFIDIR <= Iemory[>>]; <= + ; IDIR <= IFIDIR; //pass along IR--can happen anywhere, since this affects net stage only! // second instrction is in register fetch IDA <= Regs[IFIDIR[5:]]; IDB <= Regs[IFIDIR[:6]]; // get two registers // third instrction is doing address calclation or operation if ((IDop==LW) (IDop==SW)) // address calclation & copy B EOt <= IDA +{{6{IDIR[5]}}, IDIR[5:]}; else if (IDop==op) case (IDIR[5:]) //case for the varios R-type instrctions 3: EOt <= Ain + Bin; //add operation defalt: ; //other R-type operations: sbtract, SLT, etc. case EIR <= IDIR; EB <= IDB; //pass along the IR & B register else EIR <= no-op; /Freeze first three stages of pipeline; inject a nop into the otpt //em stage of pipeline if (Eop==op) EVale <= EOt; //pass along else if (Eop == LW) EVale <= Demory[EOt>>]; else if (Eop == SW) Demory[EOt>>] <=EB; //store EIR <= EIR; //pass along IR // the stage if ((Eop==op) & (Erd!= )) Regs[Erd] <= EVale; // operation else if ((Eop == LW)& (Ert!= )) Regs[Ert] <= EVale; modle FIGURE.3.3 A behavioral definition of the five-stage IPS pipeline with stalls for loads when the destination is an instrction or effective address calclation. (Contined) Implementing the Branch Hazard Logic in Verilog We can et or Verilog behavioral model to implement the control for branches. We add the code to model branch eqal sing a predict not taken strategy. The Verilog code is shown in Figre.3.. It implements the branch hazard by detecting a taken branch in ID and sing that signal to sqash the instrction in IF (by setting the IR to, which is an effective no-op in IPS-3); in addition, the is assigned to the branch target. Note that to prevent an nepected latch, it is important that the is clearly assigned on every path throgh the always block; hence, we assign the in a single if statement. Lastly, note that althogh Figre.3. incorporates the basic logic for branches and control hazards, spporting branches reqires additional bypassing and hazard detection, which we have not inclded.

8 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-9 modle CPU (clock); parameter LW = 6 b, SW = 6 b, BEQ = 6 b, no-op = 3 b, op = 6 b; inpt clock; reg[3:], Regs[:3], Iemory[:3], Demory[:3], // separate memories IFIDIR, IDA, IDB, IDIR, EIR, EB, // pipeline registers EOt, EVale, EIR; // pipeline registers wire [:] IDrs, IDrt, Erd, Erd; //hold register fields wire [5:] Eop, Eop, IDop; Hold opcodes wire [3:] Ain, Bin; // declare the bypass signals wire takebranch, stall, bypassafrome, bypassafromin,bypassbfrome, bypassbfromin, bypassafromlwin, bypassbfromlwin; assign IDrs = IDIR[5:]; assign IDrt = IDIR[5:]; assign Erd = EIR[5:]; assign Erd = EIR[:6]; assign Eop = EIR[3:6]; assign Eop = EIR[3:6]; assign IDop = IDIR[3:6]; // The bypass to inpt A from the E stage for an operation assign bypassafrome = (IDrs == Erd) & (IDrs!=) & (Eop==op); // yes, bypass // The bypass to inpt B from the E stage for an operation assign bypassbfrome = (IDrt == Erd)&(IDrt!=) & (Eop==op); // yes, bypass // The bypass to inpt A from the stage for an operation assign bypassafromin =( IDrs == Erd) & (IDrs!=) & (Eop==op); // The bypass to inpt B from the stage for an operation assign bypassbfromin = (IDrt == Erd) & (IDrt!=) & (Eop==op); / // The bypass to inpt A from the stage for an LW operation assign bypassafromlwin =( IDrs == EIR[:6]) & (IDrs!=) & (Eop==LW); // The bypass to inpt B from the stage for an LW operation assign bypassbfromlwin = (IDrt == EIR[:6]) & (IDrt!=) & (Eop==LW); // The A inpt to the is bypassed from E if there is a bypass there, // Otherwise from if there is a bypass there, and otherwise comes from the ID register assign Ain = bypassafrome? EOt : (bypassafromin bypassafromlwin)? EVale : IDA; // The B inpt to the is bypassed from E if there is a bypass there, // Otherwise from if there is a bypass there, and otherwise comes from the ID register assign Bin = bypassbfrome? EOt : (bypassbfromin bypassbfromlwin)? EVale: IDB; // The signal for detecting a stall based on the se of a from LW assign stall = (EIR[3:6]==LW) && // sorce instrction is a load ((((IDop==LW) (IDop==SW)) && (IDrs==Erd)) // stall for address calc ((IDop==op) && ((IDrs==Erd) (IDrt==Erd)))); // se FIGURE.3. A behavioral definition of the five-stage IPS pipeline with stalls for loads when the destination is an instrction or effective address calclation. The changes from Figre.3. are highlighted. (contines on net page)

9 .3-.3 An Introdction to Digital Design Using a Hardware Design Langage // Signal for a taken branch: instrction is BEQ and registers are eqal assign takebranch = (IFIDIR[3:6]==BEQ) && (Regs[IFIDIR[5:]]== Regs[IFIDIR[:6]]); reg [5:] i; //sed to initialize registers initial begin = ; IFIDIR = no-op; IDIR = no-op; EIR = no-op; EIR = no-op; // pt no-ops in pipeline registers for (i = ;i<=3;i = i+) Regs[i] = i; //initialize registers--jst so they aren t don t cares (posedge clock) begin if (~stall) begin // the first three pipeline stages stall if there is a load hazard if (~takebranch) begin // first instrction in the pipeline is being fetched normally IFIDIR <= Iemory[>>]; <= + ; else begin // a taken branch is in ID; instrction in IF is wrong; insert a no-op and reset the IFDIR <= no-op; <= + + ({{6{IFIDIR[5]}}, IFIDIR[5:]}<<); // second instrction is in register fetch IDA <= Regs[IFIDIR[5:]]; IDB <= Regs[IFIDIR[:6]]; // get two registers // third instrction is doing address calclation or operation IDIR <= IFIDIR; //pass along IR if ((IDop==LW) (IDop==SW)) // address calclation & copy B EOt <= IDA +{{6{IDIR[5]}}, IDIR[5:]}; else if (IDop==op) case (IDIR[5:]) //case for the varios R-type instrctions 3: EOt <= Ain + Bin; //add operation defalt: ; //other R-type operations: sbtract, SLT, etc. case EIR <= IDIR; EB <= IDB; //pass along the IR & B register else EIR <= no-op; /Freeze first three stages of pipeline; inject a nop into the otpt //em stage of pipeline if (Eop==op) EVale <= EOt; //pass along else if (Eop == LW) EVale <= Demory[EOt>>]; else if (Eop == SW) Demory[EOt>>] <=EB; //store // the stage EIR <= EIR; //pass along IR if ((Eop==op) & (Erd!= )) Regs[Erd] <= EVale; // operation else if ((Eop == LW)& (EIR[:6]!= )) Regs[EIR[:6]] <= EVale; modle FIGURE.3. A behavioral definition of the five-stage IPS pipeline with stalls for loads when the destination is an instrction or effective address calclation. (Contined)

10 .3 An Introdction to Digital Design Using a Hardware Design Langage.3- Using Verilog for Behavioral Specification with Synthesis To demonstrate the contrasting types of Verilog, we show two descriptions of a different, nonpipelined implementation style of IPS that ses mltiple clock cycles per instrction. (Since some instrctors make a synthesizable description of the IPS pipeline project for a class, we chose not to inclde it here. It wold also be long.) Figre.3.5 gives a behavioral specification of a mlticycle implementation of the IPS processor. Becase of the se of behavioral operations, it wold be difficlt to synthesize a separate path and control nit with any reasonable efficiency. This version demonstrates another approach to the control by sing a ealy finite-state machine (see discssion in Section A. of Appi A). The se of a ealy machine, which allows the otpt to dep both on inpts and the crrent state, allows s to decrease the total nmber of states. Since a version of the IPS design inted for synthesis is considerably more comple, we have relied on a nmber of Verilog modles that were specified in Appi A, inclding the following: The -to- mltipleor shown in Figre A.., and the 3-to- mltipleor that can be trivially derived based on the -to- mltipleor. The IPS shown in Figre A.5.5. The IPS control defined in Figre A.5.6. The IPS register file defined in Figre A.8.. Now, let s look at a Verilog version of the IPS processor inted for synthesis. Figre.3.6 shows the strctral version of the IPS path. Figre.3.7 ses the path modle to specify the IPS CPU. This version also demonstrates another approach to implementing the control nit, as well as some optimizations that rely on relationships between varios control signals. Observe that the state machine specification only provides the seqencing actions. The setting of the control lines is done with a series of assign statements that dep on the state as well as the opcode field of the instrction register. If one were to fold the setting of the control into the state specification, this wold look like a ealy-style finite-state control nit. Becase the setting of the control lines is specified sing assign statements otside of the always block, most logic synthesis systems will generate a small implementation of a finite-state machine that determines the setting of the state register and then ses eternal logic to derive the control inpts to the path. In writing this version of the control, we have also taken advantage of a nmber of insights abot the relationship between varios control signals as well as sitations where we don t care abot the control signal vale; some eamples of these are given in the following elaboration.

11 .3-.3 An Introdction to Digital Design Using a Hardware Design Langage modle CPU (clock); parameter LW = 6 b, SW = 6 b, BEQ=6 b, J=6 d; inpt clock; //the clock is an eternal inpt // The architectrally visible registers and scratch registers for implementation reg [3:], Regs[:3], emory [:3], IR, Ot, DR, A, B; reg [:] state; // processor state wire [5:] opcode; //se to get opcode easily wire [3:] SignEt,Offset; //sed to get sign-eted offset field assign opcode = IR[3:6]; //opcode is pper 6 bits assign SignEt = {{6{IR[5]}},IR[5:]}; //sign etension of lower 6 bits of instrction assign Offset = SignEt << ; // offset is shifted // set the to and start the control in state initial begin = ; state = ; //The state machine--triggered on a rising clock clock) begin Regs[] = ; //make R //shortct way to make sre R is always case (state) //action deps on the state : begin // first step: fetch the instrction, increment, go to net state IR <= emory[>>]; <= + ; state = ; //net state : begin // second step: decode, register fetch, also compte branch address A <= Regs[IR[5:]]; B <= Regs[IR[:6]]; state = 3; Ot <= + Offset; // compte -relative branch target 3: begin // third step: Load-store eection, eection, Branch completion state = ; // defalt net state if ((opcode==lw) (opcode==sw)) Ot <= A + SignEt; //compte effective address else if (opcode==6 b) case (IR[5:]) //case for the varios R-type instrctions 3: Ot = A + B; //add operation defalt: Ot = A; //other R-type operations: sbtract, SLT, etc. case FIGURE.3.5 A behavioral specification of the mlticycle IPS design. Th is has the same cycle behavior as the mlticycle design, bt is prely for simlation and specification. It cannot be sed for synthesis. (contines on net page)

12 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-3 else if (opcode == BEQ) begin if (A==B) <= Ot; // branch taken--pdate state = ; else if (opocde=j) begin = {[3:8], IR[5:], b}; // the branch target state = ; //Jmps else ; // other opcodes or eception for ndefined instrction wold go here : begin if (opcode==6 b) begin // Operation Regs[IR[5:]] <= Ot; // write the state = ; //R-type finishes else if (opcode == LW) begin // load instrction DR <= emory[ot>>]; // read the state = 5; // net state else if (opcode == LW) begin emory[ot>>] <= B; // write the state = ; // retrn to state //store finishes else ; // other instrctions go here 5: begin // LW is the only instrction still in eection Regs[IR[:6]] = DR; // write the DR to the register state = ; //complete an LW instrction case modle FIGURE.3.5 A behavioral specification of the mlticycle IPS design. (Contined)

13 .3-.3 An Introdction to Digital Design Using a Hardware Design Langage modle path (Op, RegDst, emtoreg, em, em, IorD, Reg, IR,, Cond, SrcA, SrcB, Sorce, opcode, clock); // the control inpts + clock inpt [:] Op, SrcB, Sorce; // -bit control signals inpt RegDst, emtoreg, em, em, IorD, Reg, IR,, Cond, SrcA, clock; // -bit control signals otpt [5:] opcode ;// opcode is needed as an otpt by control reg [3:], emory [:3], DR,IR, Ot; // CPU state + some temporaries wire [3:] A,B,SignEtOffset, Offset, ResltOt, Vale, Jmpr,, Ain, Bin,emOt; / these are signals derived from registers wire [3:] Ctl; //. the control lines wire Zero; the Zero ot signal from the wire[:] reg;// the signal sed to commnicate the destination register initial = ; //start the at //Combinational signals sed in the path // sing word address with either Ot or as the address sorce assign emot = em? emory[(iord? Ot : )>>]:; assign opcode = IR[3:6];// opcode shortct // Get the write register address from one of two fields deping on RegDst assign reg = RegDst? IR[5:]: IR[:6]; // Get the write register either from the Ot or from the DR assign = emtoreg? DR : Ot; // Sign-et the lower half of the IR from load/store/branch offsets assign SignEtOffset = {{6{IR[5]}},IR[5:]}; //sign-et lower 6 bits; // The branch offset is also shifted to make it a word offset assign Offset = SignEtOffset << ; // The A inpt to the is either the rs register or the assign Ain = SrcA? A : ; // inpt is or A // Compose the Jmp address assign Jmpr = {[3:8], IR[5:], b}; //The jmp address FIGURE.3.6 A Verilog version of the mlticycle IPS path that is appropriate for synthesis. This path relies on several nits from Appi A. Initial statements do not synthesize, and a version sed for synthesis wold have to incorporate a reset signal that had this effect. Also note that resetting R to on every clock is not the best way to ensre that R stays at ; instead, modifying the register file modle to prodce whenever R is read and to ignore writes to R wold be a more efficient soltion. (contines on net page)

14 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-5 // Creates an instance of the control nit (see the modle defined in Figre B.5.6 // Inpt Op is control-nit set and sed to describe the instrction class as in Chapter // Inpt IR[5:] is the fnction code field for an instrction // Otpt Ctl are the actal control bits as in Chapter alcontroller (Op,IR[5:],Ctl); // control nit // Creates a 3-to- mltipleor sed to select the sorce of the net // Inpts are ResltOt (the incremented ), Ot (the branch address), the jmp target address // Sorce is the selector inpt and Vale is the mltipleor otpt lt3to src (ResltOt,Ot,Jmpr, Sorce, Vale); // Creates a -to- mltipleor sed to select the B inpt of the // Inpts are register B,constant, sign-eted lower half of IR, sign-eted lower half of IR << // SrcB is the selector inpt // Bin is the mltipleor otpt ltto Binpt (B,3 d,signetoffset,offset,srcb,bin); // Creates a IPS // Inpts are Ctl (the control), vale inpts (Ain, Bin) // Otpts are ResltOt (the 3-bit otpt) and Zero (zero detection otpt) IPS (Ctl, Ain, Bin, ResltOt,Zero); //the // Creates a IPS register file // Inpts are // the rs and rt fields of the IR sed to specify which registers to read, // reg (the write register nmber), (the to be written), Reg (indicates a write), the clock // Otpts are A and B, the registers read registerfile regs (IR[5:],IR[:6],reg,,Reg,A,B,clock); //Register file // The clock-triggered actions of the path clock) begin if (em) emory[ot>>] <= B; // --mst be a store Ot <= ResltOt; //Save the for se on a later clock cycle if (IR) IR <= emot; // the IR if an instrction fetch DR <= emot; // Always save the read vale // The is written both conditionally (controlled by ) and nconditionally if ( (Cond & Zero)) <=Vale; modle FIGURE.3.6 A Verilog version of the mlticycle IPS path that is appropriate for synthesis.

15 An Introdction to Digital Design Using a Hardware Design Langage modle CPU (clock); parameter LW = 6 b, SW = 6 b, BEQ = 6 b, J = 6 d; //constants inpt clock; reg [:] state; wire [:] Op, SrcB, Sorce; wire [5:] opcode; wire RegDst, em, em, IorD, Reg, IR,, Cond, SrcA, emoryop, IRWwrite, emreg; // Create an instance of the IPS path, the inpts are the control signals; opcode is only otpt path IPSDP (Op,RegDst,emReg, em, em, IorD, Reg, IR,, Cond, SrcA, SrcB, Sorce, opcode, clock); initial begin state = ; // start the state machine in state // These are the definitions of the control signals assign IR = (state==); assign emreg = ~ RegDst; assign emoryop = (opcode==lw) (opcode==sw); // a operation assign Op = ((state==) (state==) ((state==3)&emoryop))? b : // add ((state==3)&(opcode==beq))? b : b; // sbtract or se fnction code assign RegDst = ((state==)&(opcode==))? : ; assign em = (state==) ((state==)&(opcode==lw)); assign em = (state==)&(opcode==sw); assign IorD = (state==)? : (state==)? : X; assign Reg = (state==5) ((state==) &(opcode==)); assign = (state==) ((state==3)&(opcode==j)); assign Cond = (state==3)&(opcode==beq); assign SrcA = ((state==) (state==))? :; assign SrcB = ((state==) ((state==3)&(opcode==beq)))? b : (state==)? b : ((state==3)&emoryop)? b : b; // operation or other assign Sorce = (state==)? b : ((opcode==beq)? b : b); // Here is the state machine, which only has to seqence states clock) begin // all state pdates on a positive clock edge case (state) : state = ; //nconditional net state : state = 3; //nconditional net state 3: // third step: jmps and branches complete state = ((opcode==beq) (opcode==j))? : ;// branch or jmp go back else net state : state = (opcode==lw)? 5 : ; //R-type and SW finish 5: state = ; // go back case modle FIGURE.3.7 The IPS CPU sing the path from Figre.3.6.

16 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-7 Elaboration When specifying control, designers often take advantage of knowledge of the control so as to simplify or shorten the control specifi cation. Here are a few eamples from the specification in Figres.3.6 and emtoreg is set only in two cases, and then it is always the inverse of RegLoc, so we jst se the inverse of RegLoc.. IR is set only in state. 3. The does not operate in every state and, when nsed, can safely do anything.. RegLoc is in only one case and can elsewhere be set to. In practice, it might be better to set it eplicitly when needed and otherwise set it to X, as we do for IorD. First, it allows frther logic optimization possibilities throgh the eploitation of don t-care terms (see Appi A for frther discssion and eamples). Second, it is a more precise specification, and this allows the simlation to more closely model the hardware, possibly ncovering additional errors in the specification. ore Illstrations of Eection on the Hardware To redce the cost of this book, starting with the third edition, we moved sections and figres that were sed by a minority of instrctors online. This sbsection recaptres those figres for readers who wold like more spplemental material to nderstand pipelining better. These are all single-clock-cycle pipeline diagrams, which take many figres to illstrate the eection of a seqence of instrctions. The three eamples are respectively for code with no hazards, an eample of forwarding on the pipelined implementation, and an eample of bypassing on the pipelined implementation. No Hazard Illstrations On page 38, we gave the eample code seqence LDUR SUB ADD LDUR ADD X, [X,#] X, X, X3 X, X3, X X3, [X,#8] X, X5, X6 Figres. and.3 showed the mltiple-clock-cycle pipeline diagrams for this two-instrction seqence eecting across si clock cycles. Figres.3.8 throgh.3. show the corresponding single-clock-cycle pipeline diagrams for these two instrctions. Note that the order of the instrctions differs between these two types of diagrams: the newest instrction is at the bottom and to the right of the mltiple-clock-cycle pipeline diagram, and it is on the left in the single-clock-cycle pipeline diagram.

17 An Introdction to Digital Design Using a Hardware Design Langage LDUR X, [X, ] fetch /E E/ ress register register register 3 Signet 6 Shift left Zero ress Clock SUB X, X, X3 LDUR X, [X, ] fetch decode /E E/ ress register register register 3 Signet 6 Shift left Zero ress Clock FIGURE.3.8 Single-cycle pipeline diagrams for clock cycles (top diagram) and (bottom diagram). This style of pipeline representation is a snapshot of every instrction eecting dring one clock cycle. Or eample has bt two instrctions, so at most two stages are identified in each clock cycle; normally, all five stages are occpied. The highlighted portions of the path are active in that clock cycle. The load is fetched in clock cycle and decoded in clock cycle, with the sbtract fetched in the second clock cycle. To make the figres easier to nderstand, the other pipeline stages are empty, bt normally there is an instrction in every pipeline stage.

18 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-9 SUB X, X, X3 decode LDUR X, [X, ] Eection /E E/ ress register register register 3 Signet 6 Shift left Zero ress Clock 3 SUB X, X, X3 Eection LDUR X, [X, ] emory /E E/ ress register register register Shift left Zero ress 3 Signet 6 Clock FIGURE.3.9 Single-cycle pipeline diagrams for clock cycles 3 (top diagram) and (bottom diagram). In the third clock cycle in the top diagram, LDUR enters the stage. At the same time, SUB enters ID. In the forth clock cycle (bottom path), LDUR moves into E stage, reading sing the address fond in /E at the beginning of clock cycle. At the same time, the sbtracts and then places the difference into /E at the of the clock cycle.

19 .3-.3 An Introdction to Digital Design Using a Hardware Design Langage SUB X, X, X3 emory LDUR X, [X, ] back /E E/ ress register register register 3 Signet 6 Shift left Zero ress Clock 5 SUB X, X, X3 back /E E/ ress register register register 3 Signet 6 Shift left Zero ress Clock 6 FIGURE.3. Single-cycle pipeline diagrams for clock cycles 5 (top diagram) and 6 (bottom diagram). In clock cycle 5, LDUR completes by writing the in E/ into register, and SUB ss the difference in /E to E/. In the net clock cycle, SUB writes the vale in E/ to register.

20 .3 An Introdction to Digital Design Using a Hardware Design Langage.3- IF:LDUR X, [X, ] ID: before<> : before<> E: before<3> : before<> /E E/ ress Clock Reg register register register [3 ] [ ] Shift left [3 ] Signet control Src Zero Op Branch em ress em emtoreg IF: SUB X, X, X3 ID: LDUR X, [X, ] : before<> E: before<> : before<3> /E E/ LDUR ress Clock X Reg register register register [3 ] [3 ] [ ] X X Signet 986 Shift left control Src Zero Op Branch em ress em emtoreg FIGURE.3. Clock cycles and. The phrase before < i> means the i th instrction before LDUR. The LDUR instrction in the top path is in the IF stage. At the of the clock cycle, the LDUR instrction is in the pipeline registers. In the second clock cycle, seen in the bottom path, the LDUR moves to the ID stage, and SUB enters in the IF stage. Note that the vales of the instrction fields and the selected sorce registers are shown in the ID stage. Hence, register X and the constant, the operands of LDUR, are written into the pipeline register. The nmber, representing the destination register nmber of LDUR, is also placed in. The top of the pipeline register shows the control vales for LDUR to be sed in the remaining stages. These control vales can be read from the LDUR row of the table in Figre.8.

21 .3-.3 An Introdction to Digital Design Using a Hardware Design Langage IF: AND X, X, X5 ID: SUB X, X, X3 : LDUR X E: before<> : before<> SUB /E E/ ress Clock 3 3 Reg register register register [3 ] [3 ] [ ] X X3 Signet X 6 Shift left X 986 Src Zero control Op Branch em ress em emtoreg IF: ORR X3, X6, X7 ID: AND X, X, X5 : SUB X,... E: LDUR X,... : before<> /E E/ AND ress Clock 5 Reg register register register [3 ] [3 ] [ ] Signet X X X5 Shift left X X3 6 Src control Zero Op Branch em ress em emtoreg FIGURE.3. Clock cycles 3 and. In the top diagram, LDUR enters the stage in the third clock cycle, adding X and to form the address in the /E pipeline register. (The LDUR instrction is written LDUR X, pon reaching, becase the identity of instrction operands is not needed by or the sbseqent stages. In this version of the pipeline, the actions of, E, and dep only on the instrction and its destination register or its target address.) At the same time, SUB enters ID, reading registers X and X3, and the and instrction starts IF. In the forth clock cycle (bottom path), LDUR moves into E stage, reading sing the vale in /E as the address. In the same clock cycle, the sbtracts X3 from X and places the difference into /E, reads registers X and X5 dring ID, and the or instrction enters IF. The two diagrams show the control signals being created in the ID stage and peeled off as they are sed in sbseqent pipe stages.

22 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-3 IF: ADD X, X8, X9 ID: ORR X3, X6, X7 : AND X,... E: SUB X,... : LDUR X,.. /E E/ ORR ress Clock Reg register register register [3 ] [3 ] [5 ] X6 X7 Signet X 36 Shift left X X5 3 Src Zero control Op Branch em ress em emtoreg IF: after<> ID: ADD X, X8, X9 : ORR X3,... E: AND X,... : SUB X,. /E E/ ADD ress Clock Reg register register register [3 ] [3 ] [5 ] Signet X X8 X9 Shift left X6 X7 36 Src control Zero Op Branch em 3 em ress emtoreg FIGURE.3.3 Clock cycles 5 and 6. Wit h ADD, the final instrction in this eample, entering IF in the top path, all instrctions are engaged. By writing the in E/ into register, LDUR completes; both the and the register nmber are in E/. In the same clock cycle, SUB ss the difference in /E to E/, and the rest of the instrctions move forward. In the net clock cycle, SUB selects the vale in E/ to write to register nmber, again fond in E/. The remaining instrctions play follow-theleader: the calclates the OR of X6 and X7 for the ORR instrction in the stage, and registers X8 and X9 are read in the ID stage for the ADD instrction. The instrctions after ADD are shown as inactive jst to emphasize what occrs for the five instrctions in the eample. The phrase after < i> means the i th instrction after ADD.

23 .3-.3 An Introdction to Digital Design Using a Hardware Design Langage IF: after<> ID: after<> : ADD X,... E: ORR X3,... : AND X,. /E E/ ress Clock 7 Reg register register register [3 ] [ ] Shift left X8 X9 Src Zero Op Branch em ress em 3 emtoreg IF: after<3> ID: after<> : after<> E: ADD X,... : ORR X3,.. /E E/ ress Clock 8 3 Reg register register register [3 ] [3 ] [3 ] Signet control Signet Shift left Src control Zero Branch em Op [ ] 3 em ress emtoreg FIGURE.3. Clock cycles 7 and 8. In the top path, the ADD instrction brings p the rear, adding the vales corresponding to registers X8 and X9 dring the stage. The of the or instrction is passed from /E to E/ in the E stage, and the stage writes the of the and instrction in E/ to register X. Note that the control signals are deasserted (set to ) in the ID stage, since no instrction is being eected. In the following clock cycle (lower drawing), the stage writes the to register X3, thereby completing ORR, and the E stage passes the sm from the ADD in /E to E/. The instrctions after ADD are shown as inactive for pedagogical reasons.

24 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-5 IF: after<> ID: after<3> : after<> E: after<> : ADD X,. /E E/ ress Clock 9 Reg register register register [3 ] [3 ] Signet Shift left control Src Zero Branch em Op [ ] em ress emtoreg FIGURE.3.5 Clock cycle 9. The stage writes the sm in E/ into register X, completing ADD and the five-instrction seqence. The instrctions after ADD are shown as inactive for pedagogical reasons. ore Eamples To nderstand how pipeline control works, let s consider these five instrctions going throgh the pipeline: LDUR SUB AND ORR ADD X, [X,#] X, X, X3 X, X, X5 X3, X6, X7 X, X8, X9 Figres.3. throgh.3.5 show these instrctions proceeding throgh the nine clock cycles it takes them to complete eection, highlighting what is active in a stage and identifying the instrction associated with each stage dring a clock cycle. If yo eamine them careflly, yo may notice: In Figre.3.3 yo can see the seqence of the destination register nmbers from left to right at the bottom of the pipeline registers. The nmbers advance

25 An Introdction to Digital Design Using a Hardware Design Langage to the right dring each clock cycle, with the E/ pipeline register spplying the nmber of the register written dring the stage. When a stage is inactive, the vales of control lines that are deasserted are shown as or X (for don t care). Seqencing of control is embedded in the pipeline strctre itself. First, all instrctions take the same nmber of clock cycles, so there is no special control for instrction dration. Second, all control information is compted dring instrction decode and then passed along by the pipeline registers. Forwarding Illstrations We can se the single-clock-cycle pipeline diagrams to show how forwarding operates, as well as how the control activates the forwarding paths. Consider the following code seqence in which the depences have been highlighted: SUB X, X, X3 AND X, X, X5 ORR X, X, X ADD X9, X, X Figres.3.6 and.3.7 show the events in clock cycles 3 6 in the eection of these instrctions. Ths, in clock cycle 5, the forwarding nit selects the /E pipeline register for the pper inpt to the and the E/ pipeline register for the lower inpt to the. The following ADD instrction reads both register X, the target of the AND instrction, and register X, which the SUB instrction has already written. Notice that the prior two instrctions both write register X, so the forwarding nit mst pick the immediately preceding one (E stage). In clock cycle 6, the forwarding nit ths selects the /E pipeline register, containing the of the or instrction, for the pper inpt bt ses the non-forwarding register vale for the lower inpt to the. Illstrating Pipelines with Stalls and Forwarding We can se the single-clock-cycle pipeline diagrams to show how the control for stalls works. Figres.3.8 throgh.3. show the single-cycle diagram for clocks throgh 7 for the following code seqence (depences highlighted): LDUR X,[X,#] AND X, X,X5 ORR X, X, X ADD X9, X, X

26 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-7 ORR X, X, X AND X, X, X5 SUB X, X, X3 before<> before<> /E E/ X X 5 X5 X3 5 3 Forwarding nit Clock 3 ADD X9, X, X ORR X, X, X AND X, X, X5 SUB X,... before<> /E E/ X X X X5 5 Forwarding nit Clock FIGURE.3.6 Clock cycles 3 and of the instrction seqence on page.3-6. The bold lines are those active in a clock cycle, and the italicized register nmbers in color indicate a hazard. The forwarding nit is highlighted by shading it when it is forwarding to the. The instrctions before SUB are shown as inactive jst to emphasize what occrs for the for instrctions in the eample. Operand names are sed in for control of forwarding; ths they are inclded in the instrction label for. Operand names are not needed in E or, so is sed. Compare this with Figres.3. throgh.3.5, which show the path withot forwarding where ID is the last stage to need operand information.

27 An Introdction to Digital Design Using a Hardware Design Langage after<> ADD X9, X, X ORR X, X, X AND X,... SUB X,.. /E E/ X X X X 9 Forwarding nit Clock 5 after<> after<> ADD X9, X, X ORR X,... AND X,.. /E E/ X X 9 Forwarding nit Clock 6 FIGURE.3.7 Clock cycles 5 and 6 of the instrction seqence on page.3-6. The forwarding nit is highlighted when it is forwarding to the. The two instrctions after ADD are shown as inactive jst to emphasize what occrs for the for instrctions in the eample. The bold lines are those active in a clock cycle, and the italicized register nmbers in color indicate a hazard.

28 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-9 AND X, X, X5 LDUR X, [X, ] before<> before<> before<3> X Hazard detection nit.em /E E/ X X XX X.RegisterRd Forwarding nit Clock ORR X, X, X and X, X, X5 5 Hazard detection nit.em LDUR X, [X, ] before<> before<> /E E/ 5 X X5 X XX 5 X.RegisterRd Forwarding nit Clock 3 FIGURE.3.8 Clock cycles and 3 of the instrction seqence on page.3-6 with a load replacing SUB. The bold lines are those active in a clock cycle, the italicized register nmbers in color indicate a hazard, and the in the place of operands means that their identity is information not needed by that stage. The vales of the significant control lines, registers, and register nmbers are labeled in the figres. The AND instrction wants to read the vale created by the LDUR instrction in clock cycle 3, so the hazard detection nit stalls the AND and ORR instrctions. Hence, the hazard detection nit is highlighted.

29 An Introdction to Digital Design Using a Hardware Design Langage ORR X, X, X AND X, X, X5 Bbble LDUR X,... before<> 5 Hazard detection nit.em /E E/ 5 X X5 X X5 5 5.RegisterRd Forwarding nit Clock ADD X9, X, X ORR X, X, X 5 Hazard detection nit.em AND X, X, X5 Bbble LDUR X,... /E E/ X X X X5 5.RegisterRd Forwarding nit Clock 5 FIGURE.3.9 Clock cycles and 5 of the instrction seqence on page.3-6 with a load replacing SUB. The bbble is inserted in the pipeline in clock cycle, and then the and instrction is allowed to proceed in clock cycle 5. The forwarding nit is highlighted in clock cycle 5 becase it is forwarding from LDUR to the. Note that in clock cycle, the forwarding nit forwards the address of the lw as if it were the contents of register X ; this is rered harmless by the insertion of the bbble. The bold lines are those active in a clock cycle, and the italicized register nmbers in color indicate a hazard.

30 .3 An Introdction to Digital Design Using a Hardware Design Langage.3-3 after<> ADD X9, X, X ORR X, X, X AND X,... Bbble Hazard detection nit.em /E E/ X X X X 9.RegisterRd Forwarding nit Clock 6 after<> after<> ADD X9, X, X ORR X,... AND X,... Hazard detection nit.em /E E/ X X 9.RegisterRd Forwarding nit Clock 7 FIGURE.3. Clock cycles 6 and 7 of the instrction seqence on page.3-6 with a load replacing SUB. Note that nlike in Figre.3.7, the stall allows the LDUR to complete, and so there is no forwarding from E/ in clock cycle 6. Register X for the ADD in the stage still deps on the from or in /E, so the forwarding nit passes the to the. The bold lines show inpt lines active in a clock cycle, and the italicized register nmbers indicate a hazard. The instrctions after ADD are shown as inactive for pedagogical reasons.

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1 .3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage 35.e.3 Advanced Topic: An Introdction to Digital Design Using a Hardware Design Langage to Describe and odel a Pipeline