EXERCISE 3: DLX II - Control - PDF Free Download

TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems TKT-3200 Computer Architectures I EXERCISE 3: DLX II - Control.. 2007 Group Name Email Student nr.

DLX-CONTROL The meaning of this exercise The meaning of this exercise is to help you understand how the control logic of a pipelined processor is realized. It is highly recommended to finish the exercise DLX-ARCHITECTURE before starting with the exercise. Procedure calls in a pipelined processor A pipelined processor, like DLX, supports procedure calls in a different way compared to unpipelined processors, like M68000 (Motorola 68000). M68000 stores the return address for a jump always into the stack of the system, and one of the registers of the processor is reserved to point to this stack. And since the entire stack is in the memory, a procedure call causes a write into the memory and a return from a procedure causes a read from the memory. If a procedure doesn t call another procedure, fetching the return address from the memory can be eliminated by storing the return address into a register. DLX processor and many other pipelined processors use this procedure. There is a certain register to where the return address will be stored. DLX uses register R31 for this purpose. A procedure is called using instruction JAL offset. An example: JAL SUBR Execution jumps to address PC+4+offset and the address of the jump instruction (PC) is stored into register R31. A procedure call is considered a jump instruction and if delayed branching is used, the instruction following the procedure call will also be executed. Thus, the return address to be stored cannot point to the instruction following the procedure call, but to the next following instruction. An example: 0020 JAL 10 0024 ADD R1,R2,R3 0028 SUB R1,R2,R3... 0034 ADDI R31,R31,#8 0038 JR R31 Notice that each instruction is four bytes, which is an important property of a pipelined processor. In the previous program the instruction ADD R1,R2,R3 will be executed before the program execution continues at address 20+4+10=34. Instruction JAL will store address 20 into the register R31. After the procedure is executed the program execution should continue at address 28. This means that the programmer must fix the value in the register R31. This fix is performed in the previous example program by an operation ADDI R31,R31,#8. The return jump is executed with an operation JR to the address that is located in the register R31. 1/14

To support procedure call and return from a procedure the following operations need to be supported: - The value of the program counter needs to be stored into register R31 - The program counter needs to be capable of loading a value from a register The datapath model of DLX does not support neither of these operations because the necessary buses are missing. The following SUPER DLX-model includes these necessary buses. The following figure illustrates the structure of the SUPER DLX-model. How the previously mentioned operations are performed in the SUPER DLX-model for instructions JAL and JR Rx? Now we are aware of how first level procedure calls are realized. What happens if a procedure calls another procedures? To support procedure calls on several levels, we need a stack to where return addresses are to be stored. As our processor doesn t have a system stack, we need to build and maintain such a stack ourselves using the available instructions. We can choose one of the registers as the stack pointer, for example register R30. The example program below calls procedure KALLE, which calls procedure NISSE. Procedures can only handle the return addresses and the stack pointer R30. 2/14

JAL KALLE... KALLE: ADDI R31,R31,#8 SUBI R30,R30,#4 SW 0(R30),R31 JAL NISSE NOP LW R31,0(R30) ADDI R30,R30,#4 JR R31 NISSE: ADDI R31,R31,#8 JR R31 Explain what the program does? Modify procedure NISSE so that it calls procedure OLLI: Investigate the following program: ADD R1,R0,R0 JAL SUBR ADDI R1,R1,#2 ADD R1,R1,R1... SUBR: ADDI R1,R1,#1 ADDI R31,R31,#8 JR R31 NOP What does register R1 contain just before instruction ADDI R1,R1,#1 is executed? (assume: R0=0) What does register R1 contain just before instruction JR R31 is exeduted? What does regiser R1 contain when instruction ADD R1,R1,R1 has just been executed? 3/14

Start the simulator and choose datapath model SUPER DLX. The program is in file exempel11.s. Load it and reset the counter. Click on the register file, which should pop up a window that shows the content of the register file. Give clock pulses until the instruction JAL is being fetched from the memory and is at IF-stage. What is the address of the JAL instruction? (At which address is it located in the memory) Give clock pulses until instruction JAL reaches MEM-stage. What does register R31 contain? Give a clock pulse. What does register R31 contain now? Why is that? The first instruction of the procedure SUBR has been already fetched. At which stage is it at? Give clock pulses until instruction JR R31 is at ID-stage. From where in the datapath is the value of register R31 obtained? Why? What is the return address? Give a clock pulse. What is now the value of the program counter? Execute the program until the end with the command RUN. What is the value of register R1? Is it the same as the value you calculated previously? Now we are going to investigate how procedures of several levels are realized. Read the following example program carefully through (the code continues on the next page): ADDI R1,R0,#1 ADDI R20,R0,#78 ADDI R2,R0,#2 ADD R3,R1,R1 JAL SUBR NOP ADD R4,R3,R0... SUBR: SUBI R2,R2,#1 ADDI R3,R3,#1 BEQZ R2,BACK ADDI R31,R31,#8 4/14

SUBI R20,R20,#4 SW 0(R20),R31 JAL SUBR NOP LW R31,0(R20) ADDI R20,R20,#4 BACK: JR R31 What register acts as a stack pointer? How many times procedure SUBR is called? What does register R4 contain right after executing instruction ADD R4 R3,R0? The program is in file exempel12.s. Load it and reset the counter. Give clock pulses until instruction JAL is being fetched from the memory and is at IF-stage. Determine the return address on the basis of the value of the program counter. Give clock pulses until instruction ADDI R31,R31,#8 reaches EX-stage. What is the value on the output of the ALU? Compare it to the return address you determined. What is now the value of the stack pointer? Give clock pulses until procedure s instruction JAL is being fetched from the memory and is at IFstage. What should be the value of the return address? Give clock pulses until procedure s instruction JAL reaches EX-stage. What is the value of the stack pointer and why? Give clock pulses until instruction JR R31 reaches ID-stage. What is now the value of register R31? To which address the program is going to return and why? From where is the return address obtained? Give clock pulses until JR R31 reaches again ID-stage. To which address are we returning now and why? From where is the return address obtained now? Execute the program until the end with the command RUN. What is now the value of register R4? 5/14

Confirm that it is the same compared to the one that you calculated. The reason why the stack mechanism is not supported is that stack-operations may cause hazards in the pipeline. Performance would then decrease. Controlling a pipelined processor Now we have studied what actions does the datapath of a pipelined processor contain. Next we are investigating the principles of controlling the datapath. Controlling the datapath is fairly simple. However, you should be aware of a few basic concepts to which the controlling is based on. These are instruction format and control signals. Instruction format Instruction format defines how the instructions of a processor are coded. Coded information contains all the necessary information (instructions) to execute an instruction. Open the link Computer Architecture Tutorial on the web-pages of this exercise and read the parts that describe the instruction set of the DLX-processor and its instruction formats. For a pipelined processor it is very important that all the instructions are equal length. If this is not the case, there will be bubbles in the pipeline due to following reasons: Let us assume that instruction i is one word and instruction i+1 is four words. When instruction i is fetched, we need four clock cycles until we have fetched instruction i+1. During fetching of instruction i+1 there are stages in the pipeline that don t do anything useful. Thus, parallel execution of the stages on the pipeline is not fully utilized in this case. Instruction decoding and control signals The control can be thought to be built in a way that a 32-bit wide register, to where the instruction can be stored, is added to each stage. With the instruction decoder we could form the control signals for functional units at each pipeline stage. For example EX-stage: Both the ALU and multiplexers MX4 and MX5 need control signals. We can use a PLA (Programmable Logic Array) or ROM which would form the control signals for the ALU and multiplexers on the basis of the input signal. However, this is not a good idea. Processor DLX has approximately 100 instructions. The delay through PLA is highly dependent on the number of minterms in the switch-function. Let us call this time T PLA. If the delay of the ALU is called T ALU, it takes at least T PLA +T ALU, before the ALU produces stable outputs. The clock cycle time of the pipeline is determined by its slowest stage, and often the ALU turns out to be the slowest, so instruction decoding cannot be done at this stage. Instruction decoding needs to be done at ID-stage for all the stages of the pipeline. Due to the clever formatting of the instruction set, the decoding and register reading can be done in parallel at ID-stage. Why is this possible? (investigate the instruction formats!) Instruction decoding is usually made so that all the control signals are created already at ID-stage. Control signals are transported in the pipeline using control signal registers, which exist in every stage of the pipeline. Processor SUPER DLX, the one we are researching in the exercise, uses 36 control signals. We only investigate few of these. 6/14

The following figure depicts SUPER DLX processor s instruction decoder, control signal registers and some chosen control signals. The instruction decoder is located at ID-stage and is realized using a PLA. It decodes the field Opcode and Func of the instruction. The 36 control signals are formed out of these fields and they follow the instruction in the pipeline. This is possible due to having a 36-bit wide control register at each clock line of the pipeline. At EX-stage three groups of the control signals are used. These are called control groups as they can contain more that just one control signal. The control group of the ALU contains four control signals that are defined in Table 1. Control groups MX4 and MX5 control the multiplexers of the EX-stage. These control signals are defined in Table 2. At MEM-stage the memory is controlled with signals WRITE and READ. In addition, the multiplexer MX6 is controlled with the signal MX6. At the final WB-stage a control signal is needed for the instructions that write in the register file (REG_WRITE-signal). However, this is not enough. We need to take care of that the target register is transported in the pipeline. Due to this there is a 5-bit register at each clock line that holds the address of the targer register. Unfortunately, the target register is not always explicitly defined in the instruction. A good example is procedure call JAL, that always writes the return address to register R31. Due to this, there is a multiplexer at ID-stage that is controlled by the decoder. With this multiplexer the address of the register R31 can be loaded into the 5-bit register that holds the address of the target register. This multpliexer has also another important function: it can be used correspondingly to choose the address of the target register for loading a value into the register as the target register is located in different places in I and R instructions formats. The multiplexer is controlled by a control group RD (see Table 3). Important: You might be wondering why register R0 is needed as a targer register. That will become clear when you will be investigating how to detect hazards in the pipeline. Then the instructions that do not perform a register write need to specify R0 as the target register. You will be wondering this when you are defining the control signals. 7/14

Table 1 ALU operation Control signals 1-4 ADD 0000 SUB 0001 OR 0010 AND 0011 XOR 0100 LHI 0101 SLL 0110 SRL 0111 SRA 1000 EQ 1001 NE 1010 LT 1011 GT 1100 LE 1101 GE 1110 PASS_A 1111 Control group Control signal number Table 2 Value Function Value Function MX4 16 0 Register file 1 PC MX5 6 0 Register file 1 Immediate MX6 26 0 ALU 1 Memory WRITE 10 0 Don t write 1 Write READ 11 0 Don t read 1 Read REG-WRITE 17 0 Don t write 1 Write 8/14

Table 3 RD-operation Control signals 27-28 R0 00 R31 01 I-FORMAT 10 R-FORMAT 11 At the end of this exercise you will be implementing decoding for a couple instructions. Due to this the control signals of the functional units are defined in Tables 1-3. Table 4 illustrates what values of the control signals should be for ADD operation. Complete the Table 4 by assigning correct control signals for the rest of the operations with the help of Tables 1-3. Table 4 Stage ID EX MEM WB Control group RD ALU MX5 MX4 WRITE READ MX6 REG_WRITE ADD 11 0000 0 0 0 0 0 1 ADDI SUB SUBI OR LW SW BNEZ BEQZ Your task is to program the content of the DECODE PLA according to the values you filled in Table 4. You will accomplish this by using specification language labalaba, which will define in/out functions for all the operations. In a file decode.ipf there is a model of labalaba-code. The function is created for ADD-operation. After labalaba directive input all the input vectors will be defined. The instruction will be identified by comparing it to these vectors (all the bits are not necessarily needed when an instruction is identified). In these cases they will be marked using dash (-). After directive output all the output signals will be named and grouped, which makes the creation of the functions easier. All the functions will be defined after labalaba-directive function. Every defined input vector needs to have a function that defines the corresponding output signals using the signals or signal groups defined in the output section. 9/14

A model of the labalaba-specification is given in file decode.ipf. Complete the functions for all the other operations by editing this file. After finishing the specification, compile it using the following command:./labalaba decode.ipf Fix all the possible syntactic errors. When labalaba does not complain about errors, it creates a file called decode.opf. Choose View Control from the menu. This will show you the datapath model of the control logic of the DLX SUPER. Identify DECODE PLA, all the control signal registers and the control signals that you will be using. Instruction register is divided into parts according to R instruction format. All the visible control signals are shown in binary format. Clicking on DECODE PLA will open a load-window. Load your labalaba-speficiation (decode.opf). Every instruction will now be tested separately starting from instruction ADDI R2,R0,#4. The instruction is in file addi.s. Load it, reset the counter. Give clock pulses until the instruction is at ID-stage. One register is read at ID-stage, what is it? Give a clock pulse. Where is the address of the target register? What are the states of MX4 and MX5? What should be the value of MX4 and MX5 control groups according to Table 2? Is the value on the output of the ALU correct? If it is not, what is the value of the ALU control group? Give a clock pulse to take the instruction to MEM-stage. What memory operation is performed? Is MX6 at right stage? If not, what is the value of the MX6 control group? 10/14

Give a clock pulse. Is the control signal REG_WRITE active? Is the address of the target register correct? Will the register R2 obtain the correct value? If the execution did not work correctly, search for errors in your labalaba-specification and recompile. The rest of the operations will be tested similarly. Use the following testfiles: Operation SUB R1,R0,R0, file sub.s Operation SUBI R1,R0,#2, file subi.s Operation OR R20,R0,R0, file or.s Operation LW R1,C(R2), file lw.s Operation SW C(R2),R1, file sw.s Operation BNEZ R0,4, file bnez.s Operation BEQZ R0,4, file beqz.s Fill in the following table as you proceed in your tests: Stage ID EX MEM WB Control group RD ALU MX5 MX4 WRITE READ MX6 REG_WRITE ADD 11 0000 0 0 0 0 0 1 ADDI SUB SUBI OR LW SW BNEZ BEQZ Compare your results to Table 4. Now we have realized the controls for some of the DLX instructions. Realization of other instructions should not cause any specific problems. So far we have ignored an inportant fact: How to identify hazards in the pipeline? Furthermore, we haven t investigated the control of MX2 and MX3. These multiplexers control the bypassing. Moreover, how to stop the pipeline when a hazard occurs? These issues will be covered next. 11/14

Hazard-logic Bypass technique needs to be used or the pipeline needs to be halted if the target register of an instruction at EX- or MEM-stage is the a source register for an instruction at ID-stage. We start from the bypass technique. By transporting the address of the target register along the pipeline we can use comparators to compare them to the source registers of instructions at ID-stage. Source operand is specified in field RS1. The figure below indicates how a data hazard is identified for the source operand of an instruction at ID-stage using comparisons. Hazard-logic has five input signals: USE_RS1, EQ_EX, EQ_MEM, EQ_R0 and LOAD. There are three outputs: MX2 (2 signals) and STALL. The outputs are used to select correct controls for the multiplexer MX2 (selection according to Table 5) or if needed, halt the pipeline. Halting the pipeline happens as follows: D-flipflops between IF- and ID-stages are turned off (a new value is not read in) and harmless operation STALL is fed to the pipeline. Harmless instruction means that nothing is written at MEM- or WB-stages. Table 5. MX2-operation Value Register file 00 Bypass from EX-stage 01 Bypass from MEM-stage 10 We will now investigate if all the simultaneously occurring hazards can be detected using the following test program: ADDI R1,R0,#1 ; R1 <- 1 ADD R2,R1,R0 ; R2 <- R1+R0 ADD R3,R1,R0 ; R3 <- R1+R0 ADD R4,R1,R0 ; R4 <- R1+R0 Load the program (exempel13.s). 12/14

Give clock pulses until instruction ADD R2,R1,R0 is at ID-stage. Will that instruction get the correct value of the register R1? What are the values of MX2 control group? (Compare it to the values in Table 5) Give a clock pulse. From where will the instruction at ID-stage get the correct value of the register R1? Next we will investigate halting the pipeline with the following test program: LW R1,18(R0) ; R1 <- M[18] ADD R2,R1,R0 ; R2 <- R1+R0 ADD R3,R1,R0 ; R3 <- R1+R0 ADD R4,R1,R0 ; R4 <- R1+R0 At what stage should the pipeline halt? Load the program (exempel14.s). At what stage is the LOAD operation when the STALL output of the HAZARD-PLA gets value 1? Give a clock pulse. Is the STALL-signal still active? Make sure the program is executed correctly. Investigate the following program: LW R1,18(R0) ; R1 <- M[18] ADD R1,R1,R1 ; R1 <- R1+R1 ADD R1,R1,R1 ; R1 <- R1+R1 ADD R4,R1,R1 ; R4 <- R1+R1 What comes to identifying hazards, what is the essential difference between this and the previous program (exempel14.s)? Load program (exempel14b.s). Observe if the program is executed correctly. 13/14

Next we will investigate the specification of DECODE-PLA. Why ADD-operation activates both the USE_RS1 and USE_RS2 control signals? Why ADDI-operation activates only the USE_RS1 control signal? Mention an instruction that doesn t activate USE_RS1 and USE_RS2 control signals? Finally, your task is to realize the control for the instructions that enable procedure calls and returns from procedures. In a procedure call, the address where the JAL (or JALR) instruction is located has to be stored into the register R31. This value is obtained from the program counter. In order to be able to write into the register R31, the target register ID needs to be 31 10 =11111 2, which can be formed by specifying correct values for the signals of the RD control group (see Table 3). In addition, the value of the program counter has to go through the ALU without performing any ALU operation. This ALU-operation is denoted as PASS_A. Fill in the following table: Stage ID EX MEM WB Control group RD ALU MX5 MX4 WRITE READ MX6 REG_WRITE JAL JR Add the functions for JAL and JR operations into the file decode.ipf. Compile the file. Test the operations by running testprogram exempel11.s. If necessary, fix the errors in the specification. Give clock pulses until instruction ADDI R1,R1,#2 is at MEM-stage. What does register R31 contain? Is its value correct? If it is not, fix the errors in the labalaba-specification. Execute the program until the end. Make sure register R1 contains the right value! You have now completed DLX-CONTROL exercise! Return this exercise paper together with the coversheet. (BOX 518, Tietotalo building, 4th floor, corridor G). 14/14