Pipelined MIPS processor with cache controller using VHDL implementation for educational purpose

Size: px

Start display at page:

Download "Pipelined MIPS processor with cache controller using VHDL implementation for educational purpose"

August Neal
5 years ago
Views:

Mahmood, College of Electrical and Electronic Techniques, Baghdad, Iraq Safaa S.

1 Journal From the SelectedWorks of Kirat Pal Singh Winter December 28, 203 Pipelined MIPS processor with cache controller using VHDL implementation for educational purpose Hadeel Sh. Mahmood, College of Electrical and Electronic Techniques, Baghdad, Iraq Safaa S. Omran, College of Electrical and Electronic Techniques, Baghdad, Iraq This work is licensed under a Creative Commons CC_BY-NC International License. Available at:

2 The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 Pipelined MIPS Processor with Cache Controller using VHDL Implementation for Educational Purposes Hadeel Sh. Mahmood Computer Engineering Techniques College of Electrical and Electronic Techniques Baghdad, Iraq Safaa S. Omran Computer Engineering Techniques College of Electrical and Electronic Techniques Baghdad, Iraq Abstract This research adopts the VHDL (Very high speed IC Hardware Description Language) design of a direct mapped cache controller for a pipelined MIPS (Microprocessor without Interlocked Pipeline Stages) processor. In this design, the instruction cache and data cache are separated and located in the CPU (Central Processing Unit) core. Write back policy is used while no replacement algorithm is required. After completing the cache controller design, it is combined with a pipelined MIPS processor and used in programs execution. These designs are synthesized using (Xilinx ISE Design Suite 3.4) and simulated using (Xilinx ISim simulator). Keywords VHDL, MIPS, instruction cache, data cache, CPU, write back. I. INTRODUCTION Computer pioneers correctly predicted that programmers would want unlimited amounts of fast memory. An economical solution to that desire is a memory hierarchy, which takes advantage of locality and trade-offs in the costperformance of memory technologies. The principle of locality says that most programs do not access all code or data uniformly. Locality occurs in time (temporal locality) and in space (spatial locality) []. A typical memory hierarchy starts with a small, expensive, and relatively fast unit, called the cache, followed by a larger, less expensive, and relatively slow main memory unit [2]. Many previous researches have made the VHDL (Very high speed IC Hardware Description Language) design of a pipelined MIPS (Microprocessor without Interlocked Pipeline Stages) processor that works with an main ideal memory that could be accessed in a single clock cycle [3-5]. However, this would be true only for a very small memory or a very slow processor. Since VHDL code is simply one of the methods to describe a hardware design, this research used it to implement the design of the cache controller for the pipelined MIPS processor [6]. II. CACHE MEMORY PRINCIPLES The cache contains a copy of portions of main memory. A request for accessing a memory element is made by the processor through issuing the address of the requested element. If the data requested by the processor appears in some block in the cache, this is called a hit. If the data is not found in the cache, the request is called a miss. The main memory is then accessed to retrieve the block containing the requested data, during that the whole pipelined MIPS must be stalled [7]. III. CACHE DESIGN ELEMENTS As shown in Fig., the cache lies on the same chip as the processor and has the following design choices: A. Number of Caches In order to avoid structural hazards, the cache has been spliced into two: one dedicated to instructions and one dedicated to data. These two caches both exist at the CPU (Central Processing Unit) core, typically as two level one (L) caches. When the processor attempts to fetch an instruction from main memory, it first consults the instruction L cache, and when the processor attempts to fetch data from main memory, it first consults the data L cache. B. Cache Addresses There is no need to include a Memory Management Unit (MMU) in this design because the cache memory directly receives physical addresses instead of virtual addresses from the MIPS processor. C. Mapping Function Because there are fewer cache lines than main memory blocks, direct mapping is used in this design to map each block of main memory into only one possible cache line. D. Write policy This design minimizes memory writes by using write back policy where a copy of the data is written to cache by the processor and not to main memory. When new data is written to cache, a dirty bit associated with the line is set. Then, when a block is replaced, it is written back to main memory if and only if the dirty bit is set.

3 Fig. complete dasign of the MIPS processor with cache memories The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203

Therefore, main memory address is organized as shown in Fig. 2. Fig. 2 Address bit field format IV. CACHE CONTROLLER The cache memory is coordinated by a cache controller.

4 The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 E. Block replacement algorithm For direct mapping, there is only one possible line for any particular block to be replaced, and no choice is possible. F. Cache Size In order to keep design simple, each cache size is limited to 64 bytes, organized as 4 lines, each line has 4 words, and each word is 4 bytes in length. Therefore, main memory address is organized as shown in Fig. 2. Fig. 2 Address bit field format IV. CACHE CONTROLLER The cache memory is coordinated by a cache controller. All addresses are first sent by the MIPS processor to the cache controller which decides whether the needed data is in the cache. If it is, then no memory access is needed, the data is provided to the MIPS processor directly from the cache; if not, then the cache controller fetches several words from main memory consecutively to fill the corresponding line in the cache. Each of data cache and instruction cache has a cache controller, the cache controller consists of: ) Finite State Machine (FSM): the FSM of instruction cache differs from that of data cache because instructions are fetched by the processor and executed without modification while data is accessed for read or write. The FSM for data cache is explained in Fig. 3, while table explains its work, and table 2 describes the function of the nine output signals of the cache FSM. state TABLE I TRUTH TABLE OF DATA CACHE FSM TABLE II THE EFFECT OF EACH OF THE NINE OUTPUT SIGNALS OF CACHE FSM. Signal Signal name Signal effect value stall cachewr cacherd memrd memwr cache_data_src mem_addr_src rst_dly wsel(:0) Hit Main memory is accessed and the whole 0 pipeline is stalled. Cache memory is accessed and the pipelined registers are captured on the next falling edge. When cache hit occurs, data supplied by the processor is written into cache memory. When cache hit occurs, data is supplied to the processor from cache memory. When cache miss occurs, data is supplied to the cache memory from main memory. When cache miss occurs and dirty bit is set, data block which is supplied by cache memory is written into main memory. The value fed to the cache_data_in input of 0 cache memory comes from the processor. The value fed to the cache_data_in input of cache memory comes from main memory. 0 inputs read write dirty stall outputs Start (st0) x 0 0 x x x 00 Write cache (st) x x 00 Read cache (st2) 0 x x x 00 WW 0 (st3) 0 or x 0 00 WW (st4) 0 or x 0 0 WW 2 (st5) 0 or x 0 0 WW 3 (st6) 0 or x 0 RW 0 (st7) RW (st8) RW 2 (st9) RW 3 (st0) cachewr cacherd memrd memwr Cache_data_src The address fed to the amem input of main memory comes from the processor. The address fed to the amem input of main memory equals to (tag & I & 0). 0 There is main memory read or write. Mem_addr_src Rst_dly wsel There is no main memory activity. 00 The first (least significant) word of memory block is selected. 0 The second word of memory block is selected. 0 The third word of memory block is selected. The fourth (most significant) word of memory block is selected. Fig. 3 State machine for data cache

The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 The state machine of instruction cache is a sub-state machine of

4, while table 3 explains its work. Fig. 4 State machine for instruction cache TABLE III TRUTH TABLE OF INSTRUCTION CACHE FSM inputs outputs Fig.

0 Rw3 (st5) 0 0 0 0 0 2) Tag cache: data tag cache contains 26 tag bits, valid bit and dirty bit for each data cache line.

5 The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 The state machine of instruction cache is a sub-state machine of data cache that does not contain the states performing write actions. The FSM for instruction cache is explained in Fig. 4, while table 3 explains its work. Fig. 4 State machine for instruction cache TABLE III TRUTH TABLE OF INSTRUCTION CACHE FSM inputs outputs Fig. 5 Complete data cache system state Hit Mem_rdy stall cachewr cacherd memrd Rst_dly wsel Start (st0) x x Read cache (st) Rw 0 (st2) Rw (st3) Rw 2 (st4) Rw3 (st5) ) Tag cache: data tag cache contains 26 tag bits, valid bit and dirty bit for each data cache line. Tag bits are used for holding the 26 most significant bits of the address being accessed, valid bit indicates whether the cache line is valid or not and dirty bit is set when cache line is over written without updating the corresponding main memory block. All valid and dirty bits are reset when the machine restarts. Instruction tag cache is similar to data tag but does not have dirty bits. V. COMPLETE MEMORY SYSTEM After completing the cache controllers design, data cache controller is combined with data cache while instruction controller is combined with instruction cache. Both of them access the main memory that consists of 52 bytes, arranged as 2 segments; one segment for data and the other for instruction, each segment has 6 blocks, each block consists of 4 words and each word contains 4 bytes. Fig. 5 shows the connections between data cache, data controller and main memory, while Connections between instruction cache, instruction controller and main memory are shown in Fig. 6. Fig. 6 Complete instruction cache system VI. VHDL TOP- LEVEL IMPLEMENTATION Fig 7. shows the pipelined MIPS connected to Instruction and data memories that can be accessed in four clock cycles. The pipelined MIPS consists of 5 pipeline stages: fetch (F) stage, decode (D) stage, execute (EXE) stage, memory (MEM) stage and write back (WB) stage. Actually processor speed is faster than memory speed, to decrease the gap between processor and memory, data and instruction caches are placed inside the MIPS CPU core as shown in Fig. 8. Whereas instruction cache is located and accessed at fetch (F) stage, data cache is located and accessed at the memory (MEM) stage.

The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 VII. RESULTS The procedure shown in Fig.

This procedure finds the factorial of number 6 and it should write (0) h to memory location (84) h and (2d0) h to memory location (80) h if all instructions run correctly, if not, it means the VHDL

6 The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 VII. RESULTS The procedure shown in Fig. 9 is stored in main memory. This procedure finds the factorial of number 6 and it should write (0) h to memory location (84) h and (2d0) h to memory location (80) h if all instructions run correctly, if not, it means the VHDL cache design is incorrect. Fig. 7 Pipelined MIPS with memories that can be accessed in four clock cycles In this design, several VHDL components are combined together to form the pipelined MIPS processor, data Cache, instruction cache and main memory. All these components are connected together as shown in Fig. 8 in order to compose the top level. Later a test bench is written and used to execute a program. Fig. 9 Top-level test procedure By using the VHDL top-level to execute this procedure, results shown in Fig. 0 have been gotten which indicate the design correctness. As memwrite signal is high, (000002d0)h is stored at memory location (80)h as well as ( )h is stored at memory location (84)h. After that, a comparison between the two systems is made in terms of performance and speedup as shown in table 4. The CPI (Clock Per Instruction) metric is calculated by using equation (program Execution time = Instruction count CPI Clock period), where instruction count in the test procedure is 90 instructions. TABLE IV PERFORMANCE COMPARISON BETWEEN MIPS SYSTEMS WITH REAL MEMORY processor Program execution time No. of clock cycles Clock period CPI Speedup Pipelined MIPS Pipelined MIPS with cache 5030 ns ns ns ns Fig. 8 Pipelined MIPS with cache memories

The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 This design is configured on Xilinx Spartan-3AN FPGA (Field

CONCLUSIONS In this paper, the VHDL design of a cache system for the pipelined MIPS has been implemented. This design used direct mapping function, write back policy.

7 The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 This design is configured on Xilinx Spartan-3AN FPGA (Field Programmable Gate Array) starter kit and the results shown in Fig. have been gotten. Fig. 0 Test procedure simulation Fig. Results on SPARTAN-3AN VIII. CONCLUSIONS In this paper, the VHDL design of a cache system for the pipelined MIPS has been implemented. This design used direct mapping function, write back policy. The cache system consists of two separated on chip caches; one for data and one for instruction. A cache controller consists of finite state machine and tag directory is associated with each cache. After completing cache system design, it is combined with a pipelined MIPS processor, then several programs were executed and simulated on both systems (pipelined MIPS system and pipelined MIPS with cache memories system) and the desired results were obtained which indicates the correctness of the design. The Xilinx ISE Design Suite 3.4 program is used for design synthesis while the Xilinx ISim simulator program is used to simulate this design which is then configured on a Xilinx Spartan-3AN FPGA starter kit. REFERENCES [] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 5 th ed., San Francisco, USA: Morgan Kaufmann, 202. [2] L. Yin, L. Tingtao, MIPS CPU Design, Sanghai Jiao Tong University, School of Software, CPU Design Report, [3] Alekya N., P. G. Kumar, Design Of 32-Bit RISC CPU Based On MIPS, Journal of Global Research in Computer Science, vol. 2, no. 9, pp , Sept. 20. [4] K. P. Singh, S. Parmar, "Vhdl Implementation of a MIPS-32 Pipeline Processor ", International Journal of Applied Engineering Research, vol. 7, No., pp , 202. [5] S. P. Katke, G. P. Jain, Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor, International Journal of Emerging Technology and Advanced Engineering, vol. 2, no. 4, pp , Apr [6] D. L. Perry, VHDL: Programming by Example, 4 th ed., America: McGraw-Hill, [7] J. L. Hennessy and D. A. Patterson, Computer Organization and Design: The Hardware/Software Interface, 4 th ed., Waltham, USA: Morgan Kaufmann, 202.

VHDL Implementation of a MIPS-32 Pipeline Processor

Journal From the SelectedWorks of Kirat Pal Singh Winter November 9, 2012 VHDL Implementation of a MIPS-32 Pipeline Processor Kirat Pal Singh Shivani Parmar This work is licensed under a Creative Commons