Single-Cycle CPU VITO KLAUDIO CSC343 FALL 2015 PROF. IZIDOR GERTNER

Size: px

Start display at page:

Download "Single-Cycle CPU VITO KLAUDIO CSC343 FALL 2015 PROF. IZIDOR GERTNER"

Melissa Logan
5 years ago
Views:

1 Single-Cycle CPU CSC343 FALL 2015 PROF. IZIDOR GERTNER

2 1 Single-Cycle CPU Table of contents 1. Objective... pg Functionality... pg Part I (ADD/SUB)... pg Part II (ORI & BITWISE OPERATIONS)... pg Part III (LW & SW)... pg Part IV (BEQ)... pg Quartus II Implementation... pg DE2-70 Board... pg Store... pg Add... pg Subtract... pg AND... pg Not... pg OR... pg XOR... pg BEQ... pg Conclusion... pg Appendix... pg. 45

3 2 Single-Cycle CPU 1. Objective The objective of this laboratory is to use Quartus II to implement a single-cycle Central- Processing-Unit (CPU). The class discussion showed how the single-cycle CPU works conceptually and we will use that knowledge to implement it on the DE2-70 board. The single-cycle CPU will be able to perform some of the basic instructions for MIPS processor, which include add, subtract, OR Immediate, bitwise operations, Store-Word (SW), Load-Word (LW), and Branch-on-Equal (BEQ), using both R and I type instructions. The most important feature of this CPU is that it performs all of its operations in one clock cycle. The benefit of this kind of functionality is that it is simple to implement, while the great disadvantage is that the clock cycle will be as long as the longest instruction is.

4 3 Single-Cycle CPU 2. Functionality The CPU that we are considering in this lab does all of the operations in one clock cycle. The operations are executed on the rising edge, or the falling edge of the clock. The following figure shows the clock methodology: Figure 1 Clock Methodology All the storage elements in the single cycle CPU are triggered by the same clock. The critical path in this kind of CPU is the length of the longest instruction, specifically the worst case is the LW (load word) instruction. We must begin to analyze our implementation of the CPU by setting up a diagram with all the components that we need to include in order to perform the desired instructions. Each of the operations will a have a certain data-path depending on what the operation is doing. The schematics of a single cycle can be summarized by the following figure:

5 4 Single-Cycle CPU Figure 2 Single Cycle CPU Diagram In this figure we can see the PC component on the far left-hand side of the figure. This is the Program Counter, which is used to calculate the address of the next instruction in the Instruction Memory component. Since the MIPS instructions are of fixed length, that is 4, we can take the result of the PC and use an adder to calculate the address of the next instruction. A multiplexer will decide if the next address of the Program Counter will be simply the addition of four or will it be the addition by four and the signed extended 16-bit number coming from the instruction. The later address is used when we are dealing with BEQ operation. The instruction coming from the Instruction Memory at the specified address is forwarded into the Registers component. The instruction in MIPS has a fixed length as well, these instructions are 32-bit long. Based on the addresses provided by the instruction, the Registers component will have two outputs. The outputs will be forwarded into the ALU (Arithmetic Logic

6 5 Single-Cycle CPU Unit), which based on the instruction will decide what to do with its inputs. If the instruction tells the CPU to store data into memory then it will do so by putting them in the Data Memory component, otherwise it will send the data back into the Registers component. The signals colored blue in the diagram are Control Signals. They instruct the operations of each of the components, while they themselves are controlled by the op-code provided in the instruction. In this laboratory we consider two instruction sets for the MIPS processor: the R and I instruction sets. Both these instruction sets have the same length but the way they define the registers is different. The R-Type instruction set is used in register-to-register type of instructions, like add or subtract, while the I-Type instruction set is mostly used when we have to deal with immediate numbers. The following figure shows how the registers are defined by each instruction: Figure 3 Instruction Types In the R-type instruction set, the 6 most significant bits are the called op. This is the part of the instruction that defines the control signals and in this way decides what to do with the registers their data. The next 5 most significant bits are called rs. This is an operand register in the operation. The same can be said for the next 5 bits which are called rt. The rd register defined by bits 16 to 11 is the destination register. The result of the operation on the two operand registers will be stored in the destination register. The shamt part of the instruction stands for

7 6 Single-Cycle CPU shift amount and it decided the amount of bits that will be shifted in a shift bitwise operation. The func part is an expansion of the op part and together they decide on the operation to be executed. In the I-type instruction set, we usually have only one operand. The 11 most significant bits of this instruction are the same as the R-type instruction set. The different starts in the bits 16 to 11 because in this instruction set these bits are used to decide the destination register. The last 16 bits of this instruction set are called immediate and they are used when dealing with a fixed number in, for example if we want to add an integer to a register we use this type of instruction.

8 7 Single-Cycle CPU 2.1. PART I (ADD/SUB) In this part of the functionality section we will talk about the two basic operations performed by our single cycle CPU: add & subtract instructions. The add instruction in MIPS is: add rd, rt, rs The subtract instruction in MIPS is: sub rd, rt, rs These two instructions have three operands. The rt and rs operands define the registers to be added or subtracted and the result will be stored in register rd. In the CPU, these instructions follow this data-path: Mem[PC] R[rd] <= R[rs] + (or -) R[rt] PC <= PC + 4 This means that we are getting the address of the instruction in the memory by the program counter in the first line, the next line adds or subtracts the contents of register rt with the contents in register rs and stores the result in register rd. The last line increments the instruction by four. These two operations are called register-to-register operations and their general format is: R[rd] <= R[rs] op R[rt]

9 8 Single-Cycle CPU figure: The data-path that these type of operations follow can be seen graphically by the following Figure 4 ADD/SUB Data-path In the above figure, the RW register in the register component is hard wired with the RD register in the instruction. The RA and RB registers are connected directly to the RS and RT registers respectively. The red lines in the diagram show the data flow. The RegWr and ALUctr signals are defined by the op and func components of the instruction. The shamt part of the instruction is not used in this case. We see that the result comes back to register component. The data can only be stored when we have both the rising edge of the Clk signal and RegWr raised to logical 1. In this diagram we are ignoring the fact that we need to somehow input some data into the registers before performing any operation. In Quartus II we will use a buffer to get the input. Since the number of switches on the DE2-70 board is limited to 18 and we are working with 32-bit

10 9 Single-Cycle CPU numbers we are going to use a buffer and some operation code in order to input 8-bits at a time for four times to get the required 32-bits. The bits are going to be input from the 8 most significant to the 8 least significant ones. This same buffer is going to be used as a selector to decide whether the input is an instruction or data. The buffer is used is called buffer_in in the final diagram showed in the DE2-70 board section of this report.

11 10 Single-Cycle CPU 2.2. PART II (ORI & BITWISE OPERATIONS, LW, SW) In this part of the lab, we will extend the previous part to include the OR Immediate and Bitwise Operations (AND Immediate and SLT) on the input data of our single cycle CPU. For these kind of operations we will use the I-type instruction set. The ORI operation in MIPS is: ori rt, rs, imm16 Mem[PC] R[rt] <= R[rs] or ZeroExt[imm16] PC <= PC + 4 The logical ANDI operation in MIPS is: andi rt, rs, imm16 Mem[PC] R[rt] <= R[rs] & ZeroExt[imm16] PC <= PC + 4 The logical SLT operation in MIPS is: sll rt, rs, imm16 Mem[PC] R[rt] <= R[rs] * ZeroExt[imm16] PC <= PC + 4 All the logical operations mentioned above need an immediate number to be computed, therefore we need to add some more components to the diagram in Part I of this section. The following figure shows the component additions to the previous data-path:

11 Single-Cycle CPU Figure 5 Logical Operations Data-path As we can see, the first thing that we add to the previous data-path is a multiplexer to decide the destination register in the register

12 11 Single-Cycle CPU Figure 5 Logical Operations Data-path As we can see, the first thing that we add to the previous data-path is a multiplexer to decide the destination register in the register component of the CPU. This addition is done due to the fact that the destination register in the I-type instruction set is decided by the 21 to 16, while these same bits where used as an operand in the R-type instruction set. The next component to be added is the ZeroExt one. This component is added in order to transform the 16-bit immediate input into a 32- bit number since our components work only with 32-bit numbers. The multiplexer that takes the extended output and the output from busb of the register component decides which of these two input will be forwarded into the ALU component where the logical operations are performed and the result is send back into the register component to be stored in the RW register.

13 12 Single-Cycle CPU 2.3. PART III (LW & SW) The load word (LW) and store word (SW) are the next two operations that we will add to our CPU. The LW instruction is the longest one in the MIPS set of instructions therefore the critical path of our CPU will be decided by the length of this operation. The LW operation in MIPS is: lw rt, rs, imm16 Mem[PC] Address <= R[rs] + SignExt[imm16] R[rt] <= Mem[Address] PC <= PC + 4 This operations takes the address of the instruction from the program counter and goes to memory to look for the contents at that address. The address where the number will be stored is computed by added the signed extension of the immediate 16-bit number from the instruction to the contents of register RS. The contents of the computed address in memory will then be stored in register RT. In this way we have loaded content from memory to register. Finally the program counter will be increased by four. Since we are dealing with memory locations we need to add the data component to our design. The following figure shows the data-path that the LW operation follows:

14 13 Single-Cycle CPU Figure 6 LW Data-path As we can see, at this point we add another multiplexer to decide whether the output will come from the ALU or from the Data Memory component of the CPU. The red lines show the data path for this operation while the orange lines are from the operations explained previously. Now, consider the SW operation in MIPS: sw rt, rs, imm16 Mem[PC] Address <= R[rs] + SignExt[imm16] Mem[Address] <= R[rt] PC <= PC + 4 As we can see in this case the contents of the register RT are now stored in memory location decided by the calculated address. The data-path of this operation is described by the following figure:

15 14 Single-Cycle CPU Figure 7 SW Data-path In this diagram, like the one in Figure 6, the red lines show the data-path.

16 15 Single-Cycle CPU 2.4. PART IV (BEQ) In this part of the functionality we deal with the Branch-On-Equal operation. This operation is performed based on a condition. The condition is decided based on two registers. We use a flag to set the condition. The flag is set to logical 1 if the two registers are equal and it is set to logical 0 if the registers are not equal. Based on the flag decided on where to branch. If the flag is 1 then we calculate the address of the next instruction, i.e. the Program counter, by adding four and adding the sign extended immediate, which must be divisible by four. Otherwise, we simply increment the program counter by four. This means that the program counter will not always increase by four in the MIPS instructions. Since sometimes the registers might have the same contents, then the program counter will be incremented by the sign extended immediate on top of the original four. The BEQ instruction in MIPS is: beq rs, rt, imm16 Mem[PC] Flag <= R[rs] == R[rt] If (Flag) o PC <= PC SignExt (imm16, 2b00) Else o PC <= PC + 4 The data-path for this instruction can be visualized by the following diagram where the red lines show the flow of the data:

17 16 Single-Cycle CPU Figure 8 BEQ Data-path

18 17 Single-Cycle CPU 3. Quartus II Implementation In this part of the report we are going to explain component by component the implementation of the single cycle CPU with three operands on Quartus II program. The first component of the diagram is buffer_in shown in the following figure: Figure 9 Buffer_In This component is used to be able to input 32-bit inputs through 8 keys from the board. Since we have a limited number of keys on the DE2-70 board and we are going to use 32-bit numbers, we need to make the most of the board available to us. The buffer works based on the op and on the clock. When op = 00 then the input will be stored on the 8 least significant bits of the output. The next state is op = 01 where the input will be stored on bits 8 to 15 of the output. Op = 10 will store the input in bits 16 to 23 of the output and finally op = 11 will store the input in the 8 most significant bits of the output. All the inputs are assigned on the rising edge of the clock signal. Furthermore, the buffer has an inst_or_data control signal which indicates whether the output should be considered as data or instruction. This is useful when we store elements in registers. After we get the output that we want we use multiplexers to decide how the output

19 18 Single-Cycle CPU should be sent to the 3 port ram. The multiplexers that we use are shown in the following two figures: Figure 10 Multiplexer (5-bits) Figure 11 Multiplexer (32-bit) The 5-bit multiplexer is used to decide the writing address based on the type of instruction, and its control signal is hard wired into a key in the board. The 32-bit multiplexer is used to decide whether the 3 port ram input data should come from the buffer or if it should come from the result of an operation.

20 19 Single-Cycle CPU Another important element is the signextend which is shown in the following figure: Figure 12 Sign Extend Component This component is used to extend a 16-bit input into 32-bit output. This component is necessary when we are working with I-Type instructions where 16-bits from the input are used as immediate variable. The next component is the 3 port ram shown in the following figure: Figure 13 Three Port RAM

21 20 Single-Cycle CPU The three port ram has six inputs and two outputs. The data input is a number which will come from the 32-bit multiplexer explained before. The wraddress, rdaddress_a, and rdaddress_b come from the instruction. Keep in mind that we use a multiplexer when we deal with different types of instructions to decide which bits of the instructions should go to the wraddress. The wren is a control signal. This signal is used to control writing to the RAM. In order to write to a register we need both the rising edge of the clock and the wren signal to be at logical 1. The two outputs qa and qb are determined by the rdaddress_a and rdaddress_b respectively. The outputs of the RAM go to the operating components. Output qa goes directly to both the bitwise and the branch on equal components, while the qb output goes through another 32- bit multiplexer which decides whether the input to the operating components should come from qb or from the immediate. This depends on the type of the instruction. The next figure shows the operate_bitwise : Figure 14 Bitwise Operations Component

22 21 Single-Cycle CPU This component takes three inputs, the two 32-bit operands, called input_a and input_b, and the 6-bit op_code which comes directly from the instruction. The output of this component is based on the op_code. When op_code = then the component will perform an add operation and output the sum of the two operands. The next operation is sub, that is subtraction of the two inputs, which is defined by op_code = The list of op code per instruction is listed below: Op_code = => Addition Op_code = => Subtraction Op_code = => AND Op_code = => NOT Op_code = => OR Op_code = => XOR The next component is the Branch on Equal operation shown in the following figure: Figure 15 Branch on Equal Component

23 22 Single-Cycle CPU This component takes two operands called input_a and input_b and checks if they are equal or not. If the inputs are not equal it will increase the Program Counter by 4, which is the standard length of the MIPS instruction. If the inputs are equal then it will add the regular length of a MIPS instruction but it will also add the sign extended immediate coming from the instruction. op_code = and choose_format = 1 tell the component to perform the BEQ operation. This component will perform the operation and output the result of the program counter on the rising edge of the clock. The next component is the choose_output shown in the following figure: Figure 16 Choose Output Component This component takes four inputs and decides which one to send to the 7-segment display based on the choose_out control signal which is 2-bit long. Each input is 32-bit long. input_a is the instruction that we feed into the system and it corresponds to choose_out = 00. input_b is the value of the input when we choose it to be data and it corresponds to choose_out = 01. The next input, input_c comes from the output of the operate_bitwise component

23 Single-Cycle CPU corresponding to choose_out = 10. The last input, input_d is the program counter which comes from the operate_beq component and corresponds to choose_out = 11.

24 23 Single-Cycle CPU corresponding to choose_out = 10. The last input, input_d is the program counter which comes from the operate_beq component and corresponds to choose_out = 11. The output of this component is send to a display decoder which takes 32-bit in binary and shows the result to the 7-segment display in hexadecimal represenatation. All these components are connected together as shown in the following figure: Figure 17 Single Cycle CPU Full Diagram At this point we assign pins to each input and output from the diagram and then we send the data to the board through the Programmer feature of the Quartus II program. The next part shows the operations on the board.

24 Single-Cycle CPU 4. DE2-70 Board For the purpose of simplicity we are going to implement only some instructions, which are Add, Subtract, AND, NOT, OR, XOR, and Branch on Equal (BEQ).

25 24 Single-Cycle CPU 4. DE2-70 Board For the purpose of simplicity we are going to implement only some instructions, which are Add, Subtract, AND, NOT, OR, XOR, and Branch on Equal (BEQ). Furthermore, Little Endian notation is used for the input. This means that the bits are reversed and the following figure shows how each instruction type is read: Figure 18 Instruction Set We start by displaying the board as soon as we send the information to it and explain how the pin assignment is made. The following figure shows the scheme: Figure 19 Pin Assignment on DE2-70 Board

26 25 Single-Cycle CPU In the above figure you can see that we assigned two different clock buttons. Key 0 controls the RAM Clock. Using this clock we store and output results from the RAM. The next assignment, Key 1 is the input clock. Every time this button is pressed the clock goes to the rising edge and therefore it will allow the input to be sent to the multiplexers, operating units, and so on. Switches 0 to 7 are assigned to the input. These are the eight bits that will be sent to the buffer for every clock cycle. Switches 8 to 9 are assigned to the input op code which controls which 8- bit of the buffer output should get the 8-bit input. Switches 11 to 12 are assigned to the display. These two switches will control which output should be assigned to the display. Switch 13 is a selector. It choose whether the 32-bit multiplexer should output the data coming from the RAM register, switch is off, or the data that comes from the immediate in the I-Type format, switch in on. Switch 14 control the buffer in unit. When this switch is off then the output will be assigned the data bits, otherwise the instruction will be outputted. Switch 15 control the type of the format. If this switch is off then the processor will consider the instruction as R-Type and otherwise if the switch is off then it will consider I-Type instruction. Switch 16 tells the 32-bit multiplexer to choose which data to send to the ram register. When this switch is off it will get the input from the buffer, otherwise the input will be fed from the result of the operating units. The last pin, Switch 17 is assigned to the RAM s write enable control signal. When this switch is off we cannot write into a register, otherwise when the switch in on and the clock is on its rising edge we can write data into a register. We will start this laboratory by storing two numbers in two different registers and then we will perform some operations on these two numbers. Initially we will simply add the contents of the two registers and display the result. The next operation will be subtraction. After subtraction we

27 26 Single-Cycle CPU start with the bitwise operations. We perform AND, NOT, OR, XOR operations in this order and display the result on the 7-segment display using the display op code. The last operation is the BEQ operation. We will try both cases, when the registers are not equal and when the registers are equal to see how the program counter changes according to the output of the BEQ condition.

27 Single-Cycle CPU 4.1. Store Consider the following figure: Figure 20 Input Number Five This picture is taken just after the input clock was pressed.

The left square in the figure shows switch 11 being on. This means that the display op code is 01 and according to our choice this will display the input to the 7-segment display.

28 27 Single-Cycle CPU 4.1. Store Consider the following figure: Figure 20 Input Number Five This picture is taken just after the input clock was pressed. Switches 0 to 7 show the 8 least significant bits of the input that we are trying to use. The input number in binary representation is: The left square in the figure shows switch 11 being on. This means that the display op code is 01 and according to our choice this will display the input to the 7-segment display. As we can wee the number displayed is , which is the input in hexadecimal representation. The next step is to input the instruction to store this number in a register. Consider the following figure: Figure 21 Store to Register 1 Instruction This picture was taken immediately after the input clock and the RAM clock were pressed. We can see that switch 9 is on in this case, this means that the input op code is 10 and this means that we are storing the input into bits 16 to 23 of the buffer output. To select register 1 we need bit

28 Single-Cycle CPU 16 to be 1, and therefore in our board it is switch 0. Switch 14 is also on, this means that the processor will now consider the input as instruction and not as data.

The instruction is ready to be fed to the RAM and therefore we raise the write enable switch and press the RAM clock to store the number 5 into register 1.

29 28 Single-Cycle CPU 16 to be 1, and therefore in our board it is switch 0. Switch 14 is also on, this means that the processor will now consider the input as instruction and not as data. At this point we press the input clock button. The instruction is ready to be fed to the RAM and therefore we raise the write enable switch and press the RAM clock to store the number 5 into register 1. The 7-segment display now is displaying the instruction since the display op code is 00. The hexadecimal number displayed is which, in binary representation is would be displayed as Now we have number 5 in register 1, we can continue by storing another number. Consider the following figure: Figure 22 Input Number Three By following the same steps as in the previous case we can now input another number. The next number that I chose is number 3. I want to store this number in register 2, therefore I need the appropriate instruction. The following figure shows the instruction: Figure 23 Store to Register 2 Instruction

30 29 Single-Cycle CPU In the above screen shot we can see that the instruction is represented on the 7-segment display as the hexadecimal number which is represented in binary format as the number This instruction tells the processor that the writing address is which corresponds to register 2 in the R-Type instruction. Since the write enable signal in on, then we can simply press the RAM clock to store the number 3 into register 2. The next operation that we want our CPU to compute is addition. Addition is explained in the next section of this report.

30 Single-Cycle CPU 4.2. Add In order to perform addition we need an instruction to specify the op code for addition and the two register addresses, the contents of which will be added together.

31 30 Single-Cycle CPU 4.2. Add In order to perform addition we need an instruction to specify the op code for addition and the two register addresses, the contents of which will be added together. The following figure shows this instruction on the board: Figure 24 Add Register 1 with Register 2 Instruction In this figure we can see the instruction that we fed into the CPU on the seven-segment display since the display op code is set to 00. This instruction is represented in hexadecimal as In binary representation this instruction is In this case the op code is which is the op code for addition. Then the address of register A is which corresponds to Register 1, and address of register B is which corresponds to register 2. Now we can perform addition by pressing the RAM clock with the write enable signal kept off. The result can be display by setting the display op code to 10. The result is (5 + 3) = 8: Figure 25 Addition Result

The following figure shows the instruction for performing this operation on the board: Figure 26 Subtract Register 2 from Register 1 Instruction In this figure we can see that the register numbers

32 31 Single-Cycle CPU 4.3. Subtract In this part of the lab we will perform the subtraction operation. Since we want to keep things simple, the subtraction will be again performed on the same two register containing the numbers 5 and 3. The following figure shows the instruction for performing this operation on the board: Figure 26 Subtract Register 2 from Register 1 Instruction In this figure we can see that the register numbers did not change for the subtraction operation. The only thing that we need to change is the op code. As stated before, the op code for subtraction is This can be seen in the figure displayed in the last seven segment display. The picture was taken right after the input clock was pressed, now it is time to press the RAM clock to perform the operation and display the result by changing the display op code to 10. The following figure shows the result: Figure 27 Subtraction Result As we expected the correct result is displayed, i.e. (5 3) = 2.

33 32 Single-Cycle CPU 4.4. AND The AND instruction is a bitwise operation. The truth table for this operation is shown below: Table 1 AND Truth Table The inputs that we are going to AND together are the same as the ones that we have been working so far. We have to represent them in binary in order to do bitwise operations on them. Let s compute the AND result of number 3 and 5 and then check the result on the board. Number 3 in binary = Number 5 in binary = AND result = So, the hexadecimal representation of the result is Let s input the instruction for the AND operation on the board. Consider the following figure:

33 Single-Cycle CPU Figure 28 Register 1 AND Register 2 Instruction In the above picture we see that the instruction deals with the same registers as before but the op code has change.

34 33 Single-Cycle CPU Figure 28 Register 1 AND Register 2 Instruction In the above picture we see that the instruction deals with the same registers as before but the op code has change. Now the op code is which corresponds to the number 2 in hexadecimal displayed in the first seven segment display. This picture was taken right after the input clock was pressed. Now it s time to send the instruction to the units by pressing the RAM clock. The result is shown in the following figure: Figure 29 AND Result As we expected the hexadecimal number is displayed on the seven segment display indicating that the AND operation was correctly computed. The next operation that we consider is the logical NOT operation.

35 34 Single-Cycle CPU 4.5. Not This instruction is also a bitwise operation which operates only on one register. For instruction that operate only on one operand we will use the input stored in register 1, that is the number 5. The NOT operation reverses the bits of the input, i.e. all the zeroes becomes ones and all the ones become zeroes. The following truth table summarizes the NOT operation: Table 2 NOT Truth Table Let s consider the number five in binary representation: Number 5 in binary = NOT result = The result of the NOT operation can be represented in hexadecimal notation as FFFFFFFA. The following figure shows the instruction that we feed into the CPU through the board. The following figure shows this instruction: Figure 30 NOT Register 1 Instruction

36 35 Single-Cycle CPU This instruction shows the difference in op code. The op code now is 4 which in binary representation is , which corresponds to the NOT instruction in the bitwise operation unit in the Quartus II diagram. You might have noticed that we still have register 2 available in the instruction but it will not affect the result since this instruction has only one operand. The following figure shows the result: Figure 31 NOT Result As we can, our expectation are met since the result on the seven segment display is the same as the one that we calculated theoretically. So far we have achieved our goals. It is time to move to the next operation, that is OR instruction.

37 36 Single-Cycle CPU 4.6. OR The OR instruction is also bitwise but this time it operates on two operands. When we OR two bits the result will be one if and only if at least one of the bits is one. The following table summarized the OR operation: Table 3 OR Truth Table Let s consider our two inputs in binary again: Number 3 in binary = Number 5 in binary = OR result = This result can be converted to hexadecimal representation which is: Now we will send an instruction for the OR operation from the board and check the result. The instruction is shown in the following figure:

38 37 Single-Cycle CPU Figure 32 Register 1 OR Register 2 Instruction We can see that only the op code has changed again, this time it is hexadecimal 3 which in binary representation is: and this corresponds to the op code for OR operation explained in the Quartus II section of this report. We send this instruction to the board by pressing the RAM clock. The result is shown in the figure below: Figure 33 OR Result As we expected the hexadecimal number is displayed in the seven segment display indicating that the operation was successfully computed. The next operation that we consider is XOR instruction.

39 38 Single-Cycle CPU 4.7. XOR The XOR operation, again operates bitwise on two operands. This logical operation will result in one if and only if the bits are different, otherwise the result will be zero. The following table summarizes the XOR operation: Table 4 XOR Truth Table We now consider our two numbers in binary and perform XOR operation on them: Number 3 in binary = Number 5 in binary = XOR result = The XOR result is in hexadecimal representation. We now input the instruction for the XOR operation as shown in the following figure: Figure 34 Register 1 XOR Register 2 Instruction

40 39 Single-Cycle CPU The op code in this instruction is 5 in hexadecimal which is represented in binary as which corresponds to be op code in the Quartus II section. This means that our instruction is correct and we can press the RAM clock button to check the result. Figure 35 XOR Result We can see that the result shown in the seven segment display is the same as the one that we computed theoretically. This means that the CPU is working correctly. The next operation that we consider is the Branch on Equal operation.

41 40 Single-Cycle CPU 4.8. BEQ The Branch on Equal operation checks for a specific condition between the contents of two registers and is performed through I-type instruction. This operation will increase the program counter by 4 if and only if the contents of the two register are not equal. If the contents of the registers are equal then the program counter will be increased by 4 and by the sign extended immediate number that we supply through the I-Type instruction. We test both cases on the board in this section of the report. The first case is when the contents of the two registers are not equal. For this purpose we will use Register 1 and Register 2 which contain numbers 5 and 3 respectively. The following figure shows the instruction to test these two register s contents: Figure 36 BEQ Register 1 Register 2 Instruction Notice that in the above picture the instruction type has changed and we can tell by the change in Switch 15, in this case it is set to logical 1. Furthermore, Switch 14 is also raised to logical one since we want to tell the multiplexer to provide the immediate number rather than the output of qb from the RAM. The instruction op code that we chose to specify BEQ is which is shown in hexadecimal representation in the first seven segment display. The register locations have not changed in this case. The other change in this instruction is the number four in the fifth

We need to check the status of the program counter at this point before pushing the RAM clock and check the increment.

42 41 Single-Cycle CPU seven segment display. This means that we are choosing the number four as immediate. This instruction can be represented in binary as: We need to check the status of the program counter at this point before pushing the RAM clock and check the increment. The following figure shows the program counter just before the BEQ operation: Figure 37 Program Counter before First BEQ Instruction We are able to see this program counter by setting the display op code to 11 and observe that the counter is 4 at this moment. The next thing to do is to press the RAM clock and check the program counter increment. Consider the following figure: Figure 38 Program Counter after First BEQ Instruction

43 42 Single-Cycle CPU We can see that the program counter is now 8, therefore the increment is 4 and the BEQ has resulted as predicted in theory. The BEQ is working when the contents of the two registers are not equal. Let s check whether the instruction works on two registers that have the same contents. For this case we use any two registers except for Register 1 and Register 2 since every other register in the RAM are initialized to zero. The instruction that we use now is shown in the figure below: Figure 39 BEQ Register 8 Register 12 Instruction We can see that the difference between this instruction and the previous one is just the two register addresses. This instruction can be written in binary representation as: which means that we are going to compare the contents of Register 8 with the contents of Register 12. Since they both have the number 0 in them the result of the branch on equal operation should be true and the increment should be by 8 this time. Consider the program counter just before we press the RAM clock button:

44 43 Single-Cycle CPU Figure 40 Program Counter before Second BEQ Instruction We notice that the counter is much larger then when we left it. This happened because the BEQ instruction is a very long one and therefore takes some cycles to adjust the counter. The number displayed on the seven segment display is hexadecimal C and we are expecting the CPU to add 8 to this counter. In hexadecimal 0x C + 0x = 0x Let s press the RAM clock and see the result shown in the following figure: Figure 41 Program Counter after Second BEQ Instruction We can see that the program counter displays the same number that we calculated theoretically. This means that the BEQ instruction is also working correctly for our CPU. Since this was the last instruction that we consider in this laboratory we can say that now the CPU works correctly for all operations.

45 44 Single-Cycle CPU 5. Conclusion This laboratory was a perfect introduction to processor design. We introduced how the single cycle processor works in theory. Then we implemented it in Quartus II using LPM modules and VHDL code. The next step was to test it on a DE2-70 Board. The benefits of the single cycle CPU is that all instructions are completed in one clock single as the name suggests. This is very beneficial since we want to do all the steps at once. Another feature of this CPU is that it takes MIPS instructions. This means that we work with three operand instructions. The disadvantage of this type of processor is that the time to perform an operation will be the time of the longest instruction. In general, the load word is the longest instruction on a single cycle CPU, but for our case we did not implement load word instruction, therefore our longest instruction was the Branch on Equal Instruction. This was obvious from the increase in program counter when we found two equal contents when comparing registers. Overall, this was a very good experience in designing a simple CPU and make it function properly on the DE2-70 board.

46 45 Single-Cycle CPU 6. Appendix 6.1. Buffer_in.vhd Library ieee; USE ieee.std_logic_1164.all; entity buffer_in is port( clock : in std_logic; inst_or_data : in std_logic; op : in std_logic_vector(1 downto 0); input : in std_logic_vector(7 downto 0); output : out std_logic_vector(31 downto 0); instruction : out std_logic_vector(31 downto 0)); end buffer_in; architecture arch of buffer_in is begin process(clock) begin if rising_edge(clock) then if(inst_or_data = '1') then case op is when "00" => output(7 downto 0) <= input; when "01" => output(15 downto 8) <= input; when "10" => output(23 downto 16) <= input; when "11" => output(31 downto 24) <= input; end case; else case op is when "00" => instruction(7 downto 0) <= input; when "01" => instruction(15 downto 8) <= input; when "10" => instruction(23 downto 16) <= input; when "11" => instruction(31 downto 24) <= input; end case; end if; end if; end process; end arch;

47 46 Single-Cycle CPU 6.2. Mux32.vhd LIBRARY ieee; USE ieee.std_logic_1164.all; LIBRARY lpm; USE lpm.lpm_components.all; ENTITY mux32 IS PORT ( data0x : IN STD_LOGIC_VECTOR (31 DOWNTO 0); data1x : IN STD_LOGIC_VECTOR (31 DOWNTO 0); sel : IN STD_LOGIC ; result : OUT STD_LOGIC_VECTOR (31 DOWNTO 0) ); END mux32; ARCHITECTURE SYN OF mux32 IS -- type STD_LOGIC_2D is array (NATURAL RANGE <>, NATURAL RANGE <>) of STD_LOGIC; 0); SIGNAL sub_wire0 : STD_LOGIC_VECTOR (31 DOWNTO 0); SIGNAL sub_wire1 : STD_LOGIC ; SIGNAL sub_wire2 : STD_LOGIC_VECTOR (0 DOWNTO 0); SIGNAL sub_wire3 : STD_LOGIC_VECTOR (31 DOWNTO 0); SIGNAL sub_wire4 : STD_LOGIC_2D (1 DOWNTO 0, 31 DOWNTO SIGNAL sub_wire5 : STD_LOGIC_VECTOR (31 DOWNTO 0); BEGIN sub_wire5 <= data0x(31 DOWNTO 0); result <= sub_wire0(31 DOWNTO 0); sub_wire1 <= sel; sub_wire2(0) <= sub_wire1; sub_wire3 <= data1x(31 DOWNTO 0); sub_wire4(1, 0) <= sub_wire3(0);

48 47 Single-Cycle CPU sub_wire4(1, 1) sub_wire4(1, 2) sub_wire4(1, 3) sub_wire4(1, 4) sub_wire4(1, 5) sub_wire4(1, 6) sub_wire4(1, 7) sub_wire4(1, 8) sub_wire4(1, 9) sub_wire4(1, 10) sub_wire4(1, 11) sub_wire4(1, 12) sub_wire4(1, 13) sub_wire4(1, 14) sub_wire4(1, 15) sub_wire4(1, 16) sub_wire4(1, 17) sub_wire4(1, 18) sub_wire4(1, 19) sub_wire4(1, 20) sub_wire4(1, 21) sub_wire4(1, 22) sub_wire4(1, 23) sub_wire4(1, 24) sub_wire4(1, 25) sub_wire4(1, 26) sub_wire4(1, 27) sub_wire4(1, 28) sub_wire4(1, 29) sub_wire4(1, 30) sub_wire4(1, 31) sub_wire4(0, 0) sub_wire4(0, 1) sub_wire4(0, 2) sub_wire4(0, 3) sub_wire4(0, 4) sub_wire4(0, 5) sub_wire4(0, 6) sub_wire4(0, 7) sub_wire4(0, 8) sub_wire4(0, 9) <= sub_wire3(1); <= sub_wire3(2); <= sub_wire3(3); <= sub_wire3(4); <= sub_wire3(5); <= sub_wire3(6); <= sub_wire3(7); <= sub_wire3(8); <= sub_wire3(9); <= sub_wire3(10); <= sub_wire3(11); <= sub_wire3(12); <= sub_wire3(13); <= sub_wire3(14); <= sub_wire3(15); <= sub_wire3(16); <= sub_wire3(17); <= sub_wire3(18); <= sub_wire3(19); <= sub_wire3(20); <= sub_wire3(21); <= sub_wire3(22); <= sub_wire3(23); <= sub_wire3(24); <= sub_wire3(25); <= sub_wire3(26); <= sub_wire3(27); <= sub_wire3(28); <= sub_wire3(29); <= sub_wire3(30); <= sub_wire3(31); <= sub_wire5(0); <= sub_wire5(1); <= sub_wire5(2); <= sub_wire5(3); <= sub_wire5(4); <= sub_wire5(5); <= sub_wire5(6); <= sub_wire5(7); <= sub_wire5(8); <= sub_wire5(9);

49 48 Single-Cycle CPU sub_wire4(0, 10) sub_wire4(0, 11) sub_wire4(0, 12) sub_wire4(0, 13) sub_wire4(0, 14) sub_wire4(0, 15) sub_wire4(0, 16) sub_wire4(0, 17) sub_wire4(0, 18) sub_wire4(0, 19) sub_wire4(0, 20) sub_wire4(0, 21) sub_wire4(0, 22) sub_wire4(0, 23) sub_wire4(0, 24) sub_wire4(0, 25) sub_wire4(0, 26) sub_wire4(0, 27) sub_wire4(0, 28) sub_wire4(0, 29) sub_wire4(0, 30) sub_wire4(0, 31) <= sub_wire5(10); <= sub_wire5(11); <= sub_wire5(12); <= sub_wire5(13); <= sub_wire5(14); <= sub_wire5(15); <= sub_wire5(16); <= sub_wire5(17); <= sub_wire5(18); <= sub_wire5(19); <= sub_wire5(20); <= sub_wire5(21); <= sub_wire5(22); <= sub_wire5(23); <= sub_wire5(24); <= sub_wire5(25); <= sub_wire5(26); <= sub_wire5(27); <= sub_wire5(28); <= sub_wire5(29); <= sub_wire5(30); <= sub_wire5(31); lpm_mux_component : lpm_mux GENERIC MAP ( lpm_size => 2, lpm_type => "LPM_MUX", lpm_width => 32, lpm_widths => 1 ) PORT MAP ( sel => sub_wire2, data => sub_wire4, result => sub_wire0 ); END SYN;

50 49 Single-Cycle CPU 6.3. Mux5.vhd LIBRARY ieee; USE ieee.std_logic_1164.all; LIBRARY lpm; USE lpm.lpm_components.all; ENTITY mux5 IS PORT ( data0x : IN STD_LOGIC_VECTOR (4 DOWNTO 0); data1x : IN STD_LOGIC_VECTOR (4 DOWNTO 0); sel : IN STD_LOGIC ; result : OUT STD_LOGIC_VECTOR (4 DOWNTO 0) ); END mux5; ARCHITECTURE SYN OF mux5 IS -- type STD_LOGIC_2D is array (NATURAL RANGE <>, NATURAL RANGE <>) of STD_LOGIC; SIGNAL sub_wire0 : STD_LOGIC_VECTOR (4 DOWNTO 0); SIGNAL sub_wire1 : STD_LOGIC ; SIGNAL sub_wire2 : STD_LOGIC_VECTOR (0 DOWNTO 0); SIGNAL sub_wire3 : STD_LOGIC_VECTOR (4 DOWNTO 0); SIGNAL sub_wire4 : STD_LOGIC_2D (1 DOWNTO 0, 4 DOWNTO 0); SIGNAL sub_wire5 : STD_LOGIC_VECTOR (4 DOWNTO 0); BEGIN sub_wire5 <= data0x(4 DOWNTO 0); result <= sub_wire0(4 DOWNTO 0); sub_wire1 <= sel; sub_wire2(0) <= sub_wire1; sub_wire3 <= data1x(4 DOWNTO 0); sub_wire4(1, 0) <= sub_wire3(0); sub_wire4(1, 1) <= sub_wire3(1); sub_wire4(1, 2) <= sub_wire3(2);

51 50 Single-Cycle CPU sub_wire4(1, 3) sub_wire4(1, 4) sub_wire4(0, 0) sub_wire4(0, 1) sub_wire4(0, 2) sub_wire4(0, 3) sub_wire4(0, 4) <= sub_wire3(3); <= sub_wire3(4); <= sub_wire5(0); <= sub_wire5(1); <= sub_wire5(2); <= sub_wire5(3); <= sub_wire5(4); lpm_mux_component : lpm_mux GENERIC MAP ( lpm_size => 2, lpm_type => "LPM_MUX", lpm_width => 5, lpm_widths => 1 ) PORT MAP ( sel => sub_wire2, data => sub_wire4, result => sub_wire0 ); END SYN; 6.4. Ram3port.vhd LIBRARY ieee; USE ieee.std_logic_1164.all; LIBRARY altera_mf; USE altera_mf.all; ENTITY ram3port IS PORT ( clock : IN STD_LOGIC ; data : IN STD_LOGIC_VECTOR (31 DOWNTO 0); rdaddress_a : IN STD_LOGIC_VECTOR (4 DOWNTO 0); rdaddress_b : IN STD_LOGIC_VECTOR (4 DOWNTO 0); wraddress : IN STD_LOGIC_VECTOR (4 DOWNTO 0);

52 51 Single-Cycle CPU wren : IN STD_LOGIC := '0'; qa : OUT STD_LOGIC_VECTOR (31 DOWNTO 0); qb : OUT STD_LOGIC_VECTOR (31 DOWNTO 0) ); END ram3port; ARCHITECTURE SYN OF ram3port IS SIGNAL sub_wire0 : STD_LOGIC_VECTOR (31 DOWNTO 0); SIGNAL sub_wire1 : STD_LOGIC_VECTOR (31 DOWNTO 0); COMPONENT alt3pram GENERIC ( indata_aclr : STRING; indata_reg : STRING; intended_device_family : STRING; lpm_type : STRING; outdata_aclr_a : STRING; outdata_aclr_b : STRING; outdata_reg_a : STRING; outdata_reg_b : STRING; rdaddress_aclr_a : STRING; rdaddress_aclr_b : STRING; rdaddress_reg_a : STRING; rdaddress_reg_b : STRING; rdcontrol_aclr_a : STRING; rdcontrol_aclr_b : STRING; rdcontrol_reg_a : STRING; rdcontrol_reg_b : STRING; width : NATURAL; widthad : NATURAL; write_aclr : STRING; write_reg : STRING ); PORT ( qa : OUT STD_LOGIC_VECTOR (31 DOWNTO 0); outclock : IN STD_LOGIC ; qb : OUT STD_LOGIC_VECTOR (31 DOWNTO 0);

53 52 Single-Cycle CPU wren : IN STD_LOGIC ; inclock : IN STD_LOGIC ; data : IN STD_LOGIC_VECTOR (31 DOWNTO 0); rdaddress_a : IN STD_LOGIC_VECTOR (4 DOWNTO 0); wraddress : IN STD_LOGIC_VECTOR (4 DOWNTO 0); rdaddress_b : IN STD_LOGIC_VECTOR (4 DOWNTO 0) ); END COMPONENT; BEGIN qa <= sub_wire0(31 DOWNTO 0); qb <= sub_wire1(31 DOWNTO 0); alt3pram_component : alt3pram GENERIC MAP ( indata_aclr => "OFF", indata_reg => "INCLOCK", intended_device_family => "Cyclone II", lpm_type => "alt3pram", outdata_aclr_a => "OFF", outdata_aclr_b => "OFF", outdata_reg_a => "OUTCLOCK", outdata_reg_b => "OUTCLOCK", rdaddress_aclr_a => "OFF", rdaddress_aclr_b => "OFF", rdaddress_reg_a => "INCLOCK", rdaddress_reg_b => "INCLOCK", rdcontrol_aclr_a => "OFF", rdcontrol_aclr_b => "OFF", rdcontrol_reg_a => "UNREGISTERED", rdcontrol_reg_b => "UNREGISTERED", width => 32, widthad => 5, write_aclr => "OFF", write_reg => "INCLOCK" ) PORT MAP ( outclock => clock, wren => wren, inclock => clock, data => data,

54 53 Single-Cycle CPU ); rdaddress_a => rdaddress_a, wraddress => wraddress, rdaddress_b => rdaddress_b, qa => sub_wire0, qb => sub_wire1 END SYN; 6.5. signextend.vhd Library ieee; USE ieee.std_logic_1164.all; use ieee.numeric_std.all; entity signextend is port( immediate : in std_logic_vector(15 downto 0); output : out std_logic_vector(31 downto 0)); end signextend; architecture arch of signextend is begin output <= std_logic_vector(resize(unsigned(immediate), 32)); end arch;

55 54 Single-Cycle CPU 6.6. operate_bitwise.vhd Library ieee; USE ieee.std_logic_1164.all; USE ieee.numeric_std.all; USE ieee.std_logic_unsigned.all; use ieee.std_logic_arith.all; entity operate_bitwise is port( op_code : in std_logic_vector(5 downto 0); input_a : in std_logic_vector(31 downto 0); input_b : in std_logic_vector(31 downto 0); output : out std_logic_vector(31 downto 0)); end operate_bitwise; architecture arch of operate_bitwise is begin process(op_code) begin case op_code is when "000000" => output <= input_a + input_b; when "000001" => output <= input_a - input_b; when "000010" => output <= input_a and input_b; when "000011" => output <= input_a or input_b; when "000100" => output <= not(input_a); when "000101" => output <= input_a xor input_b; when others => output <= std_logic_vector(resize("0",32)); end case; end process; end arch;

56 55 Single-Cycle CPU 6.7. operate_beq.vhd Library ieee; USE ieee.std_logic_1164.all; use ieee.numeric_std.all; USE ieee.std_logic_unsigned.all; use ieee.std_logic_arith.all; entity operate_beq is port( choose_format : in std_logic; clock : in std_logic; op_code : in std_logic_vector(5 downto 0); immediate : in std_logic_vector(31 downto 0); input_a : in std_logic_vector(31 downto 0); input_b : in std_logic_vector(31 downto 0); ioutput : out std_logic_vector(31 downto 0)); end operate_beq; architecture arch of operate_beq is signal instruction : std_logic_vector(31 downto 0); begin process(op_code,immediate) begin if rising_edge(clock) then instruction <= instruction + 4; ioutput <= instruction; if choose_format = '1' then case op_code is when "000001" => if(not(input_a xor input_b) = " ") then instruction <= instruction + immediate + 4; end if; ioutput <= instruction; when others => NULL;

57 56 Single-Cycle CPU end case; ioutput <= instruction; end if; end if; end process; end arch; 6.8. choose_ouput.vhd Library ieee; USE ieee.std_logic_1164.all; entity choose_output is port( input_a : in std_logic_vector(31 downto 0); input_b : in std_logic_vector(31 downto 0); input_d : in std_logic_vector(31 downto 0); input_c : in std_logic_vector(31 downto 0); choose_out : in std_logic_vector(1 downto 0); output : out std_logic_vector(31 downto 0)); end choose_output; architecture arch of choose_output is begin process(choose_out) begin case choose_out is when "00" => output <= input_a; when "01" => output <= input_b; when "10" => output <= input_c; when "11" => output <= input_d; end case; end process; end arch;

58 57 Single-Cycle CPU 6.9. dec_to_hex.vhd LIBRARY IEEE; USE IEEE.STD_LOGIC_1164.ALL; USE IEEE.STD_LOGIC_ARITH.ALL; USE IEEE.STD_LOGIC_UNSIGNED.ALL; -- Hexadecimal to 7 Segment Decoder for LED Display ENTITY dec_to_hex IS PORT( hex_digit : IN STD_LOGIC_VECTOR(3 DOWNTO 0); segment_a, segment_b, segment_c, segment_d, segment_e, segment_f, segment_g: OUT std_logic); END dec_to_hex; ARCHITECTURE a OF dec_to_hex IS SIGNAL segment_data : STD_LOGIC_VECTOR(6 DOWNTO 0); BEGIN PROCESS (Hex_digit) -- HEX to 7 segment Decoder for LED Display BEGIN -- Hex-digit is the four bit binary value to display CASE Hex_digit IS WHEN "0000" => segment_data <= " "; WHEN "0001" => segment_data <= " "; WHEN "0010" => segment_data <= " "; WHEN "0011" => segment_data <= " "; WHEN "0100" => segment_data <= " "; WHEN "0101" => segment_data <= " "; WHEN "0110" => segment_data <= " "; WHEN "0111" =>

59 58 Single-Cycle CPU segment_data <= " "; WHEN "1000" => segment_data <= " "; WHEN "1001" => segment_data <= " "; WHEN "1010" => segment_data <= " "; WHEN "1011" => segment_data <= " "; WHEN "1100" => segment_data <= " "; WHEN "1101" => segment_data <= " "; WHEN "1110" => segment_data <= " "; WHEN "1111" => segment_data <= " "; END CASE; END PROCESS; -- extract segment data bits and invert -- LED driver circuit is inverted segment_a <= NOT segment_data(6); segment_b <= NOT segment_data(5); segment_c <= NOT segment_data(4); segment_d <= NOT segment_data(3); segment_e <= NOT segment_data(2); segment_f <= NOT segment_data(1); segment_g <= NOT segment_data(0); END a;

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath CPE 442 single-cycle datapath.1 Outline of Today s Lecture Recap and Introduction Where are we with respect to the BIG picture?