ECE 254A Advanced Computer Architecture: Supercomputers Fall 2006

Size: px

Start display at page:

Download "ECE 254A Advanced Computer Architecture: Supercomputers Fall 2006"

Dinah Stafford
6 years ago
Views:

1 ECE 254A Advanced Computer Architecture: Supercomputers Fall 2006 University of California, Santa Barbara Department of Electrical and Computer Engineering Project 2 Designing a Snoopy Cache Ali Umut IRTURK ECE Department & ECON Department Graduate Student 11/6/2006 1

2 1) OVERVIEW OF THE PROJECT 3 2) DISCOVERING THE INPUT AND OUTPUT PORTS 4 3) DETAILED INFORMATION ABOUT THE DESIGNED CACHES: 8 4) TEST BENCHES: 14 5) FIGURES 24 6) CODES: 34 2

3 1) Overview of the Project The aim in this project is to design a snoopy cache protocol which maintains coherence for multiple processors using Verilog. In my projecet, I designed 7 blocks to accomplish this cache protocol fully functional. The design modules are: 1) Cpu: There are two Cpus in my design. These design units basically requests read or write to the caches. 2) Cache: There are two Caches in my design. The caches are two-way associative. There are 8 entries in the cache, and each cache entry has 11 bits, includes data, tag, update, dirty and valid bits. This design subjects are considered in detail in the following sections. 3) Memory Mapping Unit: This unit is for converting the virtual addess to the physical addess. I designed the virtual addess line as 7 bits and the physical addess line is 5 bits. This is done basically by cutting the most significant two bits of the virtual addess. 4) Memory Bus Controller: Because we are using different modules which can access the bus at the same time, the memory bus controller is designed. 5) Memory: The memory has 32 entries, each entry has 10 bits. These design modules can seen in figures 1-6 and the next section which is named discovering the inputs and outputs gives very detailed information about the usage of the modules. The most important part of the design is using: 1) Two phase clocking: This gives us the advantage of using the two edges of the clock during the state changes. 3

4 2) Snooping and Invalidation: Snooping Protocols maintain coherence for multiple processors. To maintain the coherence requirement in snooping protocols, I used write invalidate protocol. In the third section, I gave very detailed information about these most important parts of the project. After implementing these modules, I wrote 4 different test benches to see my project is working properly. I designed test benches for read misses, write misses, snooping and invalidation. The detailed information is given in the fourth section of the project report. 2) Discovering the Input and Output ports I will consider every component one by one, and find these input and output ports. However, at this step I didn t specify the length of the inputs and outputs. a) Cpu: As I said in my design, I implemented two cpus, Cpu A, Cpu B and two caches, Cache A, Cache B. When the information is needed from the caches or the information is needed to write, the one of the cpu s accesses to the cache. Thus; When any need of information is considered; i) The Cpu must inform this situation by a read signal. ii) The Cpu must inform where the data is by addess bits. However at this step, we are using Memory Mapping Unit. When writing is considered iii) The Cpu must inform this situation by a write signal. iv) The Cpu must inform which data is need to be written by data bits. 4

5 v) The Cpu must inform where the data will be written by addess bits. (used in the any need of information process) However at this step, we are using Memory Mapping Unit. This shows that there must be 4 outputs from Cpus to the Caches (Cache inputs from Cpu). I designated them using cpu_cac_name_nameofthecpu or cpu_mmu_name_nameofthecpu. Basically the read signal and write signal can be accomplished by 1 bit. However the addess and data bits will be decided later. This module can be seen in Figure 1. b) Memory Mapping Unit and the Relationship between Cpus, MMU and Caches The aim of the usage of the Memory Mapping Unit is for converting the virtual addess to the physical addess. I designed the virtual addess line as 7 bits and the physical addess line is 5 bits. Basically converting the virtual addess line to physical addess line is done by cutting the most significant two bits from the virtual addess line. The addess bits from Cpu A or Cpu B comes to Memory Mapping Unit as an input. After converting the addess bits to physical addess, MMU ss the addess bits to Cache A or Cache B. This module can be seen in Figure 2. c) Memory: If a miss occurs in the Cache after Cpu s request. The cache must access to the memory, for retrieving data. Thus, memory needs an output to the Cache for transferring it to the cache: Output Ports: 5

6 i) The requested data s by data bits from Memory to the Cache. I designated this using mem_cac_data. ii) Because we are dealing with different caches and we have a Memory Bus Controller, there must be a bit indicator which shows the caches the desired data is available. I designated this output as data_avail_mema or B which is just 1 bit. Input Ports: Input ports for the Memory come from the Caches. As I mentioned before, if a miss occurs, the memory must be accessed for to retrieve the desired block or if the Cpu wants to write information to the Memory using Cache, there must be several outputs form Cache to the Memory. If a miss occurs in Cache i) This information must be given to Memory by sing a read bit. ii) The Cache must inform where the data is by address bits. If writing situation is considered iii) The Cache must inform this situation by a write bit. iv) The Cache must inform which data is need to be written by data bits. v) If a priority writing situation occurs, I designated a signal to indicate this situation which is just 1 bit, priority_wrt_a or B. This shows that there must be 5 outputs from Cache to the Memory (Memory inputs from Cache). They are designated as cac_mem_name_nameofthecache. Basically the read bit, write bit and priority write bit can be accomplished by 1 bit. This module can be seen in Figure 3. 6

7 d) Memory Bus Controller Because we are using different modules which can access the bus at the same time, the memory bus controller is required in this project. The Memory Bus Controller receives the request and gives the control of the bus to the requester block. The aim of the priority write bit is to designate the priority of the write back process. Inputs to the Memory Bus Controller: i) The request from cache A: bus_req_a ii) Priority request from Cache A: priority_ req_a iii) The request from cache B: bus_ req_b iv) Priority request from Cache B: priority_ req_b v) The request from memory: bus_ mem Output to the Caches and Memory: vi) The bit data shows the bus is given to the Cache A: bus_a vii) The bit data shows the bus is given to the Cache B: bus_b viii) The bit data shows the bus is given to the Memory: bus_mem This module can be seen in Figure 4. e) Cache: The caches are the other important parts of this design. There must be several outputs from Cache to the Cpu and Memory. And there are several inputs from the other design modules: Cpu, MMU, Memory and Memory Bus Controller. The relationship between Cache and Cpu As discussed in the Cpu part, there must be 4 inputs from the Cpu to the Cache, such as read, write, addess and data bits. And there must be another input from the MMU, as the addess bits which is physical addess. 7

8 Output Ports: If the Cpu gives read signal and s the addess of the data i) If the data requested by the processor appears in the Cache, this is called hit. First this information must be given to Cpu by sing a hit bit. And the found data must be sent back to Cpu, so Cache needs an output to the Cpu to s data bits. ii) If the data is not found in the Cache, the request is called a miss. The memory is then accessed to retrive the block containing the requested data. This information must be given to the Cpu by sing a miss bit. This shows that there must be 3 outputs from Cache to the Cpu (Cpu inputs from Cache). These are designated by cac_cpu_name_nameofthecache. Basically the hit signal and miss signal can be accomplished by 1 bit. As a result, by considering the above blocks, we can draw the cache block without considering snooping and invalidation which can be seen in Figure 5. At this point we designed every required module for the project. I combined these modules in Figure 7 which we can see the general picture. 3) Detailed information About the Designed Caches: Set Associative Cache Design, Snooping and Invalidation and Write Back Where can a block be placed in a cache? As we know we have three different for this question: 1) Direct Mapped Cache Design 2) Fully Associative Cache Design 3) Set Associative Cache Design 8

9 In the first project which is the design of a simple cache, I used direct mapped cache design. In this project, I used Two-Way Set Associative cache design to make implementation more realistic. In this kind of cache design, a block can be placed in a restricted set of places in the cache. Here a set is group of blocks in the cache. A block is first mapped onto a set, and then the block can be placed anywhere within that set. The set is chosen by bit selection; that is, (Block addess) MOD (Number of sets in the cache which is 2 in this design) For every data there are two blocks for storage at the same index. We can consider this situation like two pages on top of each other. This gives us a better understanding about concept. But this situation gives us another important question; Which block should be replaced on a Cache Miss? After a miss occurs, the cache controller must select a block to be replaced with the desired data. In our situation there are two possible blocks to replace the desired data. As we know there are three primary strategies employed for selecting which block to replace: 1) Random 2) Least-Recently Used 3) First in, First out In this project I used the second strategy, Least-Recently Used (LRU). In this approach, we are reducing the chance of throwing out information that will be needed soon. For achieve this stability, accesses to blocks must be recorded. I achieved this by using update bits in cache entries. So, relying on the past to predict future, the block replaced id the one that has been unused for the longest time. In my situation there are two pages means that there are two possible blocks to replace the desired. I always check the update bits to understand which is recently used, it must be 1. If I write a data to an index, I make the update bit 1 and it is important to make the update bit in the next page 0 for later. 9

10 Here comes another important question: What happens on a Write? As we know again, there are two basic options when writing to the cache: 1) Write Through 2) Write Back I used Write Through in the simple cache design which is easier to implement than write back. In that situation, the information was written to both the block in the cache and to the block in the memory. However, in this project there are different important concepts which I will discuss later in this project report. Thus, I used Write Back method in this project. In this method, the information is written only to the block in the cache. The modified cache block is written to memory only when it is replaced. When using Write Back method, a new feature must be introduced. Usage of dirty bits reduces the frequency of writing back blocks on replacement. This status bit indicates whether the block is dirty which means it is modified while in the cache or clean which means that it is not modified. So, if it is clean, the block is not written back on a miss, because identical information to the cache can be found in memory. This will be another bit in the cache entry. Thus, I used the advantage of write back method which gives us less usage of memory bandwidth. The cache entry in this project is in Figure. Figure - Cache Entry. The additions from the first project are the dirty bit and update bit which is described above. 10

11 After we decided the Cache Entry, we need to consider the question that how is a block found if it is the cache. Snooping Protocols: This part is the most important part of the project. Snooping Protocols maintain coherence for multiple processors. The actual name of these types of protocols is called cache coherence protocols. The key to implementing a cache coherence protocol is tracking the state of any sharings of a data block. Actually there are two classes of protocols: 1) Directory based 2) Snooping In my project, we considered the snooping protocols. In this protocol, every cache that has a copy of the data from a block of physical memory also has a copy of the sharings status of the block, and no centralized state is kept. The point is the caches are on a shared-memory bus, and all cache controllers snoop on the bus to determine whether or not they have a copy of a block that is requested on the bus. To maintain the coherence requirement in snooping protocols, there are two methods: 1) Write invalidate protocol 2) Write update or Write broadcast protocol In my design, I used write invalidate protocol as described in our lectures. In Write invalidate protocol, processor has exclusive access to a data item before it writes that item. The name is write invalidate because it invalidates other copies on a write. Exclusive access ensures that no other readable or writeable copies of an item exist when the write occurs because all other cached copies of the item are already invalidated. For Invalidation, the processor simply acquires bus access and broadcasts the address to be invalidated on the bus. All processors continuously snoop on the bus, watching the 11

12 addresses. The processors check whether the address on the bus is in their cache. If so, the corresponding data in the cache are invalidated. Cache State Transitions: We have three states which can be seen from the figure 9: Invalid, Shared, Exclusive. In Invalid State: There are two states of addressed cache block, read miss and write miss. Suppose a CPU requests a read, and read miss occurred. Then the read miss must be placed on the bus, and after the data stored in the cache, the state must be changed to Shared. Suppose a CPU requests a write, and write miss occurred. Then the write miss must be placed on the bus. And after the data is stored in the cache, the state must be changed to Exclusive. In Shared State: There are three states of addressed cache block, read miss, read hit and write miss. Suppose a Cpu requests a read, and read miss occurred. Then the read miss must be placed on the bus. After the data is stored in the cache, the state must stay in the Shared state. Suppose a Cpu requests a read, and read hit occurred. Then the state must stay in the Shared state. Suppose a Cpu requests a write, and write miss occurs. Then the write miss must be placed on the bus. After the data is stored in the cache, the state must be changed to Exclusive State. Exclusive State: There are four states of addressed cache block read miss, read hit, write hit and write miss. Suppose a Cpu requests a read, and read miss occurred. Then the read miss must be placed on the bus. After the data is stored in the cache, the state must be changed to Shared State. 12

13 Suppose a Cpu requests a read, and read hit occurred. Then the state must stay in the Exclusive State. Suppose a Cpu requests a write, and write miss occurs. Then the write miss must be placed on the bus. After the data is stored in the cache, the state must stay in Exclusive State. Suppose a Cpu requests a write and write hit occurred. Then the state must stay in Exclusive state. This different situation can be seen perfectly from the figure 9. 13

14 4) Test Benches: 1) Testing Read processes: The goal is to see that Memory Bus Controller is working properly In my design, the cache entries are filled by zeros. As we know I have two different Cpus. These Cpus are sent read request at the same time to different caches. The Cpu A ss request to Cache A and Cpu B ss request to Cache B. Because all the Caches are filled by zeros, read misses occur. In this situation each cache needs to access the memory to retrieve data. However, the simultaneous access is not possible. The Memory Bus Controller gives the control to one of the cache. The data is retrieved from the memory (grant is given to the other cache) and stored in the first page of the granted Cache. After this process, another read requests are sent with same addesses. And read hit must occur. The process: 1) The Cpu A requests the data in the addess and Cpu B requests the data in the addess ) Because both of the Caches are filled with zeros, no tags matched and read misses occurred. 3) Both of the caches s requests to the Memory Control Unit. Bus_bus_bus_req_A = 1 and Bus_bus_req_B = 1. 4) Firstly the bus is granted to the Cpu A. 5) Cac_mem_rd_A set to 1 which means that Cache will read the data in the desired addess. After this, the bus is granted to the Cache B. 6) Cac_memAdd_A set to Memory has the in the addess Thus, mem_cac_data_a is stored with the desired data ) The data is stored the cache A page 0 and sent to Cpu, cac_cpu_data_a is set to ) After the datas are stored to the both caches. The same read request occurs, and this results read hit in cache A. 14

15 Read hit occurred after the read misses. The requests are handled by the memory bus controller. The coordination between the caches, memory bus controller and memory is working properly. 2) Testing Read processes: The goal is to see that Two-way associative is working properly. This test bench is related to the Test bench 1. In Test Bench 1, the Cpus requested data in the same addess two times. What happens if the Cpus requested data from a different addess between previously declared addess. In the first requests, the cache misses must occur. And the memory bus must be granted to one of the Caches. After retrieving the data from the memory for both caches, the data must be written to the first pages of the caches. If the caches request another data from a different addess, the new data must be written to the second pages of the caches. At last, if the Cpu requests the data from the 15

16 firstly used addess. There must be a hit. This shows that the two-way associativity is working properly and stores the update bits. 1) The Cpu A requests the data in the addess and Cpu B request the data in the addess ) Because both of the Caches are filled with zeros, no tags matched and read misses occurred. 3) Both of the caches s requests to the Memory Control Unit. bus_bus_bus_req_a = 1 and bus_bus_bus_req_b = 1. 4) Firstly the bus is granted to the Cpu A. 5) Cac_mem_rd_A set to 1 which means that Cache will read the data in the desired addess. After this, the bus is granted to the Cache B. 6) Cac_memAdd_A set to Memory has the in the addess Thus, mem_cac_data_a is stored with the desired data ) The data is stored the cache A page 0 and sent to Cpu, cac_cpu_data_a is set to ) The Cpu A requests the data from a new addess and Cpu B request the data from a new addess ) No tags matched and read misses occurred. 10) Both of the caches s requests to the Memory Control Unit. B = 1 and Bus_bus_req_B = 1. 11) The bus is granted to the Cpu A again. 12) Cac_mem_rd_A set to 1 which means that Cache will read the data in the desired addess. After this, the bus is granted to the Cache B. 13) Cac_memAdd_A set to Memory has the in the addess Thus, mem_cac_data_a is stored with the desired data ) The data is stored the cache A page 1 and sent to Cpu, cac_cpu_data_a is set to ) Again, the Cpu A requests the data from the first used addess and Cpu B request the data from the first used addess ) This results read hit in cache A. 16

17 Read hit occurred after two read misses. Because of the two-way associative cache system, first data is stored in the page 1 and the second data is stored in page 2. And the request of the first data is accomplished successfully. 17

18 3) Testing Write processes: In this situation, both Cpus request writes to the each cache. However, the first write attempt results with a write miss. As a result the data is written to the first pages of the caches, write back to the memory is performed and the data is invalidated. In the second write attempt to the same addess results with a write hit. 1) The Cpu A requests a write in the addess with the data and Cpu B request a write in the addess with the data ) Write misses occurred. 18

19 3) Both of the caches s requests to the Memory Control Unit. bus_req_a = 1 and bus_req_b = 1. 4) Firstly the bus is granted to the Cpu A. 5) Cac_mem_wrt_A set to 1 and cac_mem_data_a is set to ) After the data is written the cache and the memory with the write back, Invalidation is performed. 7) And again, The Cpu A requests a write in the addess with the data and Cpu B request a write in the addess with the data ) Write hit occurs and no write back is performed. 19

20 4) Testing Snooping and Write-back: Any transition to the exclusive state which is required for a processor to write the block requires a write miss to be placed on the bus, causing all caches to make the block invalid. In addition, if some other cache had the block in exclusive state, that cache generates a write back, which supplies the block containing the desired address. 1) The Cpu A requests a write in the addess with the data

21 2) Write miss occurred. 3) Cache A ss a request to the memory. bus_req_a = 1. 4) The bus is granted to the Cpu A. 5) Cac_mem_wrt_A set to 1 and cac_mem_data_a is set to ) After the data is written the cache and the memory with the write back, Invalidation is performed. 7) And again, The Cpu A requests a write in the addess with the data which is a different data. 8) Write hit occurs and no write back is performed. However this makes the data dirty. 9) Then the Cache B reads the same address, snooping. 10) Cache A is required to priority write back the dirty data in memory. 21

22 22

23 23

24 5) Figures Figure 1: The Resulting CPU Module 25 Figure 2: Resulting Memory Mapping Module 26 Figure 3 : The Resulting Memory Module 27 Figure 4 : The Resulting Memory Bus Controller Module 28 Figure 5 : The Resulting Cache Module 29 Figure 6 : The relationship between Cache A and Cache B 30 Figure 7 : General Design of the Project 31 Figure 8 : The addess which is sent by Cpu matches to the both cache. Cpu Tag and Cpu Index constructs the addess which is sent by Cpu. 32 Figure 9 : Cache State Transitions 33 24

25 Figure 1: The Resulting CPU Module 25

26 Figure 2: Resulting Memory Mapping Module 26

27 Figure 3 : The Resulting Memory Module 27

28 Figure 4 : The Resulting Memory Bus Controller Module 28

29 Figure 5 : The Resulting Cache Module 29

30 Figure 6 : The relationship between Cache A and Cache B 30

31 Figure 7 : General Design of the Project 31

32 Figure 8 : The addess which is sent by Cpu matches to the both cache. Cpu Tag and Cpu Index constructs the addess which is sent by Cpu. 32

33 Figure 9 : Cache State Transitions 33

34 6) Codes: Cache_A `timescale 1ns/100ps // Module module Cache_A( //Global inputs and outputs clock, //Asynchronous active low reset rst_l, //Inputs from CPU cpu_cac_add_a, cpu_cac_rd_a, cpu_cac_wrt_a, cpu_cac_data_a, //Inputs from Main Memory mem_cac_data_a, data_avail_mema, //Inputs from mmu mmu_cac_add_a; //Inputs to the Memory Bus Controller bus_bus_bus_req_a, priority_bus_bus_req_a, //Snooping input ports snoop_b, snoop_add_b, invalidate_b //Outputs to CPU cac_cpu_hit_a, cac_cpu_miss_a, cac_cpu_data_a, //Outputs to Main Memory cac_data_avail_mem_add_a, cac_mem_data_a, cac_mem_rd_a, cac_mem_wrt_a, priority_wrt_a, //Outputs from Memory Bus Controller bus_a, //Snooping output ports snoop_a, snoop_add_a, invalidate_a, ); // Input Ports 34

35 //Global input clock; input rst_l; //CPU input [4:0]cpu_cac_data_A; input [4:0]cpu_cac_add_A; input cpu_cac_rd_a; input cpu_cac_wrt_a; // Memory input [4:0]mem_cac_data_A; input data_avail_mema; //Memory Bus Controller input bus_a; //Snooping input snoop_b; input [4:0] snoop_add_b; input invalidate_b; // Output Ports //CPU output [4:0] cac_cpu_data_a; output cac_cpu_hit_a; output cac_cpu_miss_a; // Memory output [4:0] cac_mem_data_a; output [4:0] cac_data_avail_mem_add_a; output cac_mem_rd_a; output cac_mem_wrt_a; output priority_wrt_a; //Memory Bus Controller output bus_bus_bus_req_a; output priority_bus_bus_req_a; //Snooping output snoop_a; output [4:0] snoop_add_a; output invalidation_a; // Registers //CPU reg [4:0] cac_cpu_data_a; reg cac_cpu_hit_a; reg cac_cpu_miss_a; // Memory reg [4:0] cac_mem_data_a; reg [4:0] cac_memadd_a; reg cac_mem_rd_a; reg cac_mem_wrt_a; reg priority_wrt_a; //Memory Bus Controller reg bus_bus_bus_req_a; reg priority_bus_bus_req_a; 35

36 //Snooping reg snoop_a; reg [4:0] snoop_add_a; reg snoop_page_1; reg snoop_page_2; reg invalidate_a; //Cache Buffer reg [10:0] buffer_1 [0:7]; reg [10:0] buffer_2 [0:7]; //Read/Write regs reg read_a; reg write_a; reg dirty; //Cache FSM reg [2:0] state; reg [2:0] next_s; reg [2:0] back_t_s; // Net //Cache-CPU Interface wire [10:0] cpu_buffer_1; //Buffer value at current index in page 0 wire [10:0] cpu_buffer_2; //Buffer value at current index in page 1 wire [2:0] cpu_index; //Index value from CPU addess wire [1:0] cpu_tag; //Tag value from CPU addess wire [1:0] cur_tag_1; wire [1:0] cur_tag_2; wire [4:0] add_mem; wire [4:0] cac_data; //Current Cache Data wire [4:0] mem_data; wire [1:0] snoop_tag; wire [2:0] snoop_index; wire [10:0] snoop_buffer_1; wire [10:0] snoop_buffer_2; wire [10:0] cac_data_1; wire [10:0] cac_data_2; wire valid_1; wire valid_2; wire dirty_1; wire dirty_2; wire update_1; wire update_2; //Integer integer i; //Parameters parameter S0 = 0; // Initial parameter S0 = 1; // Wait State 1 parameter S2 = 2; // Store State parameter S3 = 3; // Wait State 2 parameter S4 = 4; parameter S5 = 5; //priority_write state 36

37 //Assign //Cache-CPU assign cpu_tag = cpu_cac_add_a[1:0]; assign cur_tag_1 = cpu_buffer_1[6:5]; assign cur_tag_2 = cpu_buffer_2[6:5]; assign cpu_index = cpu_cac_add_a[4:2]; assign add_mem = cpu_cac_add_a[4:0]; assign valid_1 = cpu_buffer_1[10]; assign valid_2 = cpu_buffer_2[10]; assign dirty_1 = cpu_buffer_1[9]; assign dirty_2 = cpu_buffer_2[9]; assign update_1 = cpu_buffer_1[7]; assign update_2 = cpu_buffer_2[7]; assign cpu_buffer_1 = buffer_1[cpu_index]; assign cpu_buffer_2 = buffer_2[cpu_index]; //Cache- Memory assign cac_data_1 = buffer_1[cpu_index]; assign cac_data_2 = buffer_2[cpu_index]; assign mem_data = mem_cac_data_a[4:0]; assign snoop_tag = snoop_add_b[1:0]; assign snoop_index = snoop_add_b[4:2]; assign snoop_buffer_1 = buffer_1[snoop_index]; assign snoop_buffer_2 = buffer_2[snoop_index]; //Begin // Cache_A // // 2 phase clock clock) next_s <= state; clock or negedge rst_l) if(rst_l == 0) //Store the initials cac_mem_rd_a <= 1'b0; cac_mem_wrt_a <= 1'b0; cac_cpu_data_a <= 5'b0; cac_cpu_miss_a <= 1'b0; cac_cpu_hit_a <= 1'b0; cac_mem_data_a <= 5'b0; cac_memadd_a <= 5'b0; snoop_a <= 1'b0; invalidate_a <= 1'b0; bus_bus_bus_req_a <= 1'b0; for(i=0; i <= 7; i = i+1) //Store 0 s into both pages buffer_1[i] <= 11'b0; 37

38 buffer_2[i] <= 11'b0; else // If its not a hard reset case(next_s) // State S0 - Initial S0: Begin //store initials again cac_cpu_data_a <= 5'b0; cac_cpu_miss_a <= 1'b0; cac_cpu_hit_a <= 1'b0; cac_mem_data_a <= 5'b0; cac_memadd_a <= 5'b0; cac_mem_rd_a <= 1'b0; cac_mem_wrt_a <= 1'b0; priority_bus_bus_req_a <= 1'b0; snoop_a <= 1'b0; snoop_page_1 <= 1'b0; snoop_page_2 <= 1'b0; invalidate_a <= 1'b0; read_a <= 1'b0; write_a <= 1'b0; dirty <= 1'b0; priority_wrt_a <= 1'b0; bus_bus_bus_req_a <= 1'b0; / //If it is a read if(cpu_cac_rd_a == 1'b1) read_a <= 1'b1; if(cpu_tag == cur_tag_1) // For Page 1 //read miss in INVALID state if(valid_1 == 1'b0) $display ("Read Miss In Invalid State PAGE 1"); cac_cpu_miss_a <= 1'b1; bus_bus_bus_req_a <= 1'b1; //read hit in EXCLUSIVE state if((dirty_1 == 1'b1) & (valid_1 == 1'b1)) $display ("Read Hit In Exculsive State PAGE 1"); cac_cpu_hit_a <= 1'b1; cac_cpu_data_a <= cpu_buffer_1[4:0]; buffer_1[cpu_index] <= buffer_1[cpu_index] 11'b ; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b ; //read hit in SHARED state 38

39 if((dirty_1 == 1'b0) & (valid_1 == 1'b1)) $display ("Read Hit In Shared State PAGE 1"); cac_cpu_hit_a <= 1'b1; cac_cpu_data_a <= cpu_buffer_1[4:0]; buffer_1[cpu_index] <= buffer_1[cpu_index] 11'b ; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b ; if(cpu_tag == cur_tag_2) //Check Page 2 of the Cache if(valid_2 == 1'b0) $display ("Read Miss In Invalid State PAGE 2"); cac_cpu_miss_a <= 1'b1; bus_bus_bus_req_a <= 1'b1; //read hit in EXCLUSIVE state if((dirty_2 == 1'b1) & (valid_2 == 1'b1)) $display ("Read Hit In Exculsive State PAGE 2"); cac_cpu_hit_a <= 1'b1; cac_cpu_data_a <= cpu_buffer_2[4:0]; buffer_2[cpu_index] <= buffer_2[cpu_index] 11'b ; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b ; //read hit in SHARED state if((dirty_2 == 1'b0) & (valid_2 == 1'b1)) $display ("Read Hit In Shared State PAGE 2"); cac_cpu_hit_a <= 1'b1; cac_cpu_data_a <= cpu_buffer_2[4:0]; buffer_2[cpu_index] <= buffer_2[cpu_index] 11'b ; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b ; if((cpu_tag!= cur_tag_1) & (cpu_tag!= cur_tag_2)) $display ("Read Miss In Invalid State"); cac_cpu_miss_a <= 1'b1; bus_bus_bus_req_a <= 1'b1; //if it is a WRITE if(cpu_cac_wrt_a == 1'b1) write_a <= 1'b1; if(cpu_tag == cur_tag_1) //write miss in INVALID state if(valid_1 == 1'b0) $display ("Write Miss In Invalid State PAGE 1"); cac_cpu_miss_a <= 1'b1; 39

40 bus_bus_bus_req_a <= 1'b1; //write miss in EXCLUSIVE state else if((dirty_1 == 1'b1) & (valid_1 == 1'b1)) $display ("Write Hit In Exculsive State PAGE 1"); cac_cpu_hit_a <= 1'b1; buffer_1[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b ; //write hit in SHARED state else if((dirty_1 == 1'b0) & (valid_1 == 1'b1)) $display ("Write Hit In Shared State PAGE 1"); buffer_1[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b ; bus_bus_bus_req_a <= 1'b1; back_t_s <= S0; state <= S4; else if(cpu_tag == cur_tag_2) if(valid_2 == 1'b0) $display ("Write Miss In Invalid State PAGE 2"); cac_cpu_miss_a <= 1'b1; bus_bus_bus_req_a <= 1'b1; //write hit in EXCLUSIVE state else if((dirty_2 == 1'b1) & (valid_2 == 1'b1)) $display ("Write Hit In Exculsive State PAGE 2"); cac_cpu_hit_a <= 1'b1; buffer_2[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b ; //write hit in SHARED state else if((dirty_2 == 1'b0) & (valid_2 == 1'b1)) $display ("Write Hit In Shared State PAGE 2"); buffer_2[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b ; bus_bus_bus_req_a <= 1'b1; back_t_s <= S0; state <= S4; else if((cpu_tag!= cur_tag_1) & (cpu_tag!= cur_tag_2)) $display ("Write Miss In Invalid state"); cac_cpu_miss_a <= 1'b1; bus_bus_bus_req_a <= 1'b1; //if snooping 40

41 if(snoop_b == 1'b1) if((snoop_tag == snoop_buffer_1[6:5])&(snoop_buffer_1[10] == 1'b1)) if(snoop_buffer_1[9] == 1'b1) priority_bus_bus_req_a <= 1'b1; snoop_page_1 <= 1'b1; back_t_s <= S0; state <= S3; else else if((snoop_tag == snoop_buffer_2[6:5])&(snoop_buffer_2[10] == 1'b1)) if(snoop_buffer_2[9] == 1'b1) priority_bus_bus_req_a <= 1'b1; snoop_page_2 <= 1'b1; back_t_s <= S0; state <= S3; else else //if Invalidation if(invalidate_b == 1'b1) if((snoop_tag == snoop_buffer_1[6:5])&(snoop_buffer_1[10] == 1'b1)) buffer_1[snoop_index] <= (buffer_1[snoop_index] & 11'b ); buffer_2[snoop_index] <= (buffer_2[snoop_index] 11'b ); else if((snoop_tag == snoop_buffer_2[6:5])&(snoop_buffer_1[10] == 1'b1)) buffer_2[snoop_index] <= (buffer_2[snoop_index] & 11'b ); buffer_1[snoop_index] <= (buffer_1[snoop_index] 11'b ); else //State 0 //State S0 - Wait S0: 41

42 cac_cpu_miss_a <= 1'b0; priority_wrt_a <= 1'b0; $display ("Waiting in S0"); if(bus_a == 1'b1) cac_mem_rd_a <= 1'b1; cac_memadd_a <= add_mem; snoop_a <= 1'b1; snoop_add_a <= add_mem; bus_bus_bus_req_a <= 1'b0; state <= S2; //if invalidation else if(invalidate_b == 1'b1) if((snoop_tag == snoop_buffer_1[6:5])&(snoop_buffer_1[10] == 1'b1)) buffer_1[snoop_index] <= (buffer_1[snoop_index] & 11'b ); buffer_2[snoop_index] <= (buffer_2[snoop_index] 11'b ); if(snoop_add_b == cpu_cac_add_a) bus_bus_bus_req_a <= 1'b0; cac_cpu_miss_a <= 1'b1; else else if((snoop_tag == snoop_buffer_2[6:5])&(snoop_buffer_2[10] == 1'b1)) buffer_2[snoop_index] <= (buffer_2[snoop_index] & 11'b ); buffer_1[snoop_index] <= (buffer_1[snoop_index] 11'b ); if(snoop_add_b == cpu_cac_add_a) bus_bus_bus_req_a <= 1'b0; cac_cpu_miss_a <= 1'b1; else else else //S0 //State S2 42

43 S2: dirty <= 1'b0; snoop_a <= 1'b0; cac_mem_wrt_a <= 1'b0; cac_mem_rd_a <= 1'b0; snoop_page_1 <= 1'b0; snoop_page_2 <= 1'b0; priority_wrt_a <= 1'b0; if((data_avail_mema == 1'b1) (dirty == 1'b1)) if(update_1 == 1'b0) //Data needs to be written in page 1 if(dirty_1 == 1'b1) cac_mem_wrt_a <= 1'b1; cac_memadd_a <= add_mem; cac_mem_data_a <= cpu_cac_data_a; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b ; dirty <= 1'b1; state <= S2; else if(dirty_1 == 1'b0) buffer_1[cpu_index] <= {1'b1,1'b0,1'b0,1'b1,cpu_tag,mem_cac_data_A}; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b ; if((read_a == 1'b1) & (write_a == 1'b0)) else if((read_a == 1'b0) & (write_a == 1'b1)) buffer_1[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; cac_mem_wrt_a <= 1'b1; cac_memadd_a <= add_mem; cac_mem_data_a <= cpu_cac_data_a; invalidate_a <= 1'b1; snoop_add_a <= add_mem; else if(update_2 == 1'b0) if(dirty_2 == 1'b1) cac_mem_wrt_a <= 1'b1; cac_memadd_a <= add_mem; cac_mem_data_a <= cpu_cac_data_a; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b ; dirty <= 1'b1; state <= S2; else if(dirty_2 == 1'b0) buffer_2[cpu_index] <= {1'b1,1'b0,1'b0,1'b1,cpu_tag,mem_cac_data_A}; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b ; if((read_a == 1'b1) & (write_a == 1'b0)) 43

44 if((read_a == 1'b0) & (write_a == 1'b1)) buffer_2[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; cac_mem_wrt_a <= 1'b1; cac_memadd_a <= add_mem; cac_mem_data_a <= cpu_cac_data_a; invalidate_a <= 1'b1; snoop_add_a <= add_mem; //if snooping else if(snoop_b == 1'b1) if((snoop_tag == snoop_buffer_1[6:5]) & (snoop_buffer_1[10] == 1'b1)) if(snoop_buffer_1[9] == 1'b1) priority_bus_bus_req_a <= 1'b1; snoop_page_1 <= 1'b1; back_t_s <= S2; state <= S3; else state <= S2; else if((snoop_tag == snoop_buffer_2[6:5]) & (snoop_buffer_2[10] == 1'b1)) if(snoop_buffer_2[9] == 1'b1) priority_bus_bus_req_a <= 1'b1; snoop_page_2 <= 1'b1; back_t_s <= S2; state <= S3; else state <= S2; else state <= S2; //if invalidation else if(invalidate_b == 1'b1) if((snoop_tag == snoop_buffer_1[6:5])& (snoop_buffer_1[10] == 1'b1)) 44

45 buffer_1[snoop_index] <= buffer_1[snoop_index] & 11'b ; buffer_2[snoop_index] <= buffer_2[snoop_index] 11'b ; state <= S2; else if((snoop_tag == snoop_buffer_2[6:5]) & (snoop_buffer_2[10] == 1'b1)) buffer_2[snoop_index] <= buffer_2[snoop_index] & 11'b ; buffer_1[snoop_index] <= buffer_1[snoop_index] 11'b ; state <= S2; else state <= S2; // S2 // State S3 S3: if(bus_a == 1'b1) priority_bus_bus_req_a <= 1'b0; if((snoop_page_1 == 1'b1)&(snoop_page_2 == 1'b0)) priority_wrt_a <= 1'b1; cac_memadd_a <= snoop_add_b; cac_mem_data_a <= snoop_buffer_1[4:0]; buffer_1[snoop_index] <= buffer_1[snoop_index] & 11'b ; state <= S5; else if((snoop_page_1 == 1'b0)&(snoop_page_2 == 1'b1)) priority_wrt_a <= 1'b1; cac_memadd_a <= snoop_add_b; cac_mem_data_a <= snoop_buffer_2[4:0]; buffer_2[snoop_index] <= buffer_2[snoop_index] & 11'b ; state <= S5; else state <= S3; //S3 //State S4 S4: if(bus_a == 1'b1) bus_bus_bus_req_a <= 1'b0; cac_cpu_hit_a <= 1'b1; invalidate_a <= 1'b1; snoop_add_a <= add_mem; cac_mem_wrt_a <= 1'b1; 45

46 cac_memadd_a <= add_mem; cac_mem_data_a <= cpu_cac_data_a; state <= back_t_s; // S4 //State S5 S5: state <= back_t_s; case module //Cache_A 46

47 Memory Module `timescale 1ns/100ps module Memory (clock, rst_1, cac_data_avail_mem_add_a, cac_mem_data_a, cac_mem_rd_a, cac_mem_wrt_a, cac_data_avail_mem_add_b, cac_mem_data_b, cac_mem_rd_b, cac_mem_wrt_b, mem_cac_data_b, mem_cac_data_a, bus_bus_mem, priority_wrt_a, priority_wrt_b, mema, memb, bus_mem ); //Inputs input clock, rst_1; input cac_mem_rd_a,cac_mem_wrt_a,cac_mem_rd_b,cac_mem_wrt_b; input [4:0] cac_data_avail_mem_add_a; input [4:0] cac_data_avail_mem_add_b; input [4:0] cac_mem_data_a; input [4:0] cac_mem_data_b; input priority_wrt_a; input priority_wrt_b; input bus_mem; //Outputs output [4:0] mem_cac_data_a; output [4:0] mem_cac_data_b; output bus_bus_mem; output mema; output memb; //Registers reg [0:9] memarray [31:0]; reg [4:0] mem_cac_data_a; reg [4:0] mem_cac_data_b; reg [4:0] mem_cache_data; reg mema; reg memb; reg bus_bus_mem; reg ready_bit_a; reg ready_bit_b; reg nexta; reg nextb reg [2:0] state; reg [4:0] add; 47

48 reg [4:0] next_add; //Internals parameter S0 = 0; parameter S0 = 1; parameter S2 = 2; parameter S3 = 3; parameter S4 = 4; //Memory clock or negedge rst_1) if (~rst_1) memarray[31] = 10'b ; memarray[30] = 10'b ; memarray[29] = 10'b ; memarray[28] = 10'b ; memarray[27] = 10'b ; memarray[26] = 10'b ; memarray[25] = 10'b ; memarray[24] = 10'b ; memarray[23] = 10'b ; memarray[22] = 10'b ; memarray[21] = 10'b ; memarray[20] = 10'b ; memarray[19] = 10'b ; memarray[18] = 10'b ; memarray[17] = 10'b ; memarray[16] = 10'b ; memarray[15] = 10'b ; memarray[14] = 10'b ; memarray[13] = 10'b ; memarray[12] = 10'b ; memarray[11] = 10'b ; memarray[10] = 10'b ; memarray[9] = 10'b ; memarray[8] = 10'b ; memarray[7] = 10'b ; memarray[6] = 10'b ; memarray[5] = 10'b ; memarray[4] = 10'b ; memarray[3] = 10'b ; memarray[2] = 10'b ; memarray[1] = 10'b ; memarray[0] = 10'b ; mem_cac_data_a <= 5'b0; mem_cac_data_b <= 5'b0; mema <= 1'b0; memb <= 1'b0; nexta <= 1'b0; nextb <= 1'b0; add <= 5'b0; bus_bus_mem <= 1'b0; else 48

49 case(state) S0: mema <= 1'b0; memb <= 1'b0; ready_bit_a <= 1'b0; ready_bit_b <= 1'b0; bus_bus_mem <= 1'b0 add <= 5'b0; ; // IF WRITE if (priority_wrt_a == 1'b1) memarray[cac_data_avail_mem_add_a] <= {cac_data_avail_mem_add_a, cac_mem_data_a}; else if (priority_wrt_b == 1'b1) memarray[cac_data_avail_mem_add_b] <= {cac_data_avail_mem_add_b, cac_mem_data_b}; else if ((cac_mem_wrt_a == 1'b1) & (cac_mem_rd_a == 1'b0)) memarray[cac_data_avail_mem_add_a] <= {cac_data_avail_mem_add_a, cac_mem_data_a}; else if ((cac_mem_wrt_b == 1'b1) & (cac_mem_rd_b == 1'b0)) memarray[cac_data_avail_mem_add_b] <= {cac_data_avail_mem_add_b, cac_mem_data_b}; // FINISH WRITE else if ((cac_mem_wrt_a == 1'b0) & ((cac_mem_rd_a == 1'b1) (nexta == 1'b1))) if ((cac_mem_rd_a == 1'b1)&(nextA == 1'b0)) add <= cac_data_avail_mem_add_a; ready_bit_a <= cac_mem_rd_a; else if ((cac_mem_rd_a == 1'b0)&(nextA == 1'b1)) add <= next_add; ready_bit_a <= 1'b1; nexta <= 1'b0; else if ((cac_mem_wrt_b == 1'b0) & ((cac_mem_rd_b == 1'b1) (nextb == 1'b1))) 49

50 if ((cac_mem_rd_b == 1'b1)&(nextB == 1'b0)) add <= cac_data_avail_mem_add_b; ready_bit_b <= cac_mem_rd_b; else if ((cac_mem_rd_b == 1'b0)&(nextB == 1'b1)) add <= next_add; ready_bit_b <= 1'b1; nextb <= 1'b0; else //S0 S0: bus_bus_mem <= 1'b1; if(priority_wrt_a == 1'b1) memarray[cac_data_avail_mem_add_a] <= {cac_data_avail_mem_add_a, cac_mem_data_a}; state <= S2; else if (priority_wrt_b == 1'b1) memarray[cac_data_avail_mem_add_b] <= {cac_data_avail_mem_add_b, cac_mem_data_b}; state <= S2; else if(cac_mem_wrt_a == 1'b1) memarray[cac_data_avail_mem_add_a] <= {cac_data_avail_mem_add_a, cac_mem_data_a}; state <= S2; else if (cac_mem_wrt_b == 1'b1) memarray[cac_data_avail_mem_add_b] <= {cac_data_avail_mem_add_b, cac_mem_data_b}; state <= S2; else state <= S2; //S0 S2: if(priority_wrt_a == 1'b1) 50

51 case memarray[cac_data_avail_mem_add_a] <= {cac_data_avail_mem_add_a, cac_mem_data_a}; state <= S2; else if (priority_wrt_b == 1'b1) memarray[cac_data_avail_mem_add_b] <= {cac_data_avail_mem_add_b, cac_mem_data_a}; state <= S2; else if (cac_mem_rd_a == 1'b1) nexta <= 1'b1; next_add <= cac_data_avail_mem_add_a; state <= S2; else if (cac_mem_rd_b == 1'b1) nextb <= 1'b1; next_add <= cac_data_avail_mem_add_b; state <= S2; else if (cac_mem_wrt_a == 1'b1) memarray[cac_data_avail_mem_add_a] <= {cac_data_avail_mem_add_a, cac_mem_data_a}; state <= S2; else if (cac_mem_wrt_b == 1'b1) memarray[cac_data_avail_mem_add_b] <= {cac_data_avail_mem_add_b, cac_mem_data_b}; state <= S2; else if (bus_mem == 1'b1) bus_bus_mem <= 1'b0; if ((ready_bit_a == 1'b1)&(ready_bit_B == 1'b0)) mem_cac_data_a <= memarray[add]; mema <= 1'b1; if ((ready_bit_a == 1'b0)&(ready_bit_B == 1'b1)) mem_cac_data_b <= memarray[add]; memb <= 1'b1; else state <= S2; //S2 51

52 module Memory Bus Controller `timescale 1ns/100ps module MemoryBusController( clock, rst_1, bus_bus_req_a, bus_bus_req_b, bus_mem, bus_mem, bus_a, bus_b, priority_bus_req_a, priority_bus_req_b ); // Inputs input clock; input rst_1; input bus_bus_req_a; input bus_bus_req_b; input priority_bus_req_a; input priority_bus_req_b; input bus_mem; //Outputs output bus_a; output bus_b; output bus_mem; //Registers reg bus_a; reg bus_b; reg bus_mem; reg[2:0] bus_state; //Internals wire clock; wire rst_1; wire bus_bus_req_a; wire bus_bus_req_b; wire priority_bus_req_a; wire priority_bus_req_b; wire bus_mem; //Parameters parameter S0 = 0; //Initial parameter S1 = 1; //Granting the bus to the Cache A parameter S2 = 2; //Granting the bus to the Cache B parameter S3 = 3; //Granting the bus to the memory parameter S4 = 4; //Wait 52

53 // Module clock or negedge rst_1) if(rst_1 == 0) //Initials Bus_ bus_a <= 1'b0; bus_b <= 1'b0; bus_mem <= 1'b0; else case(bus_state) S0: //Initial bus_a <= 1'b0; bus_b <= 1'b0; bus_mem <= 1'b0; if (priority_bus_req_a == 1'b1) //Priority Cache A Bus_state <= S1; else if (priority_bus_req_b == 1'b1) //Priority Cache B Bus_state <= S2; else if (bus_bus_req_a == 1'b0 & bus_bus_req_b ==1'b0 & bus_mem ==1'b1) //Memory Bus_state <= S3; else if (bus_bus_req_a ==1'b1 & bus_bus_req_b == 1'b0 & bus_mem == 1'b1) Bus_state <= S3; else if (bus_bus_req_a == 1'b0 & bus_bus_req_b == 1'b1 & bus_mem == 1'b1) Bus_state <= S3; else if (bus_bus_req_a == 1'b1 & bus_bus_req_b == 1'b1 & bus_mem == 1'b1) Bus_state <= S3; else if (bus_bus_req_a == 1'b1 & bus_bus_req_b == 1'b0 & bus_mem == 1'b0) Bus_state <= S1; else if (bus_bus_req_a == 1'b0 & bus_bus_req_b == 1'b1 & bus_mem == 1'b0) Bus_state <= S2; 53

54 else if (bus_bus_req_a == 1'b1 & bus_bus_req_b == 1'b1 & bus_mem == 1'b0) Bus_state <= S1; else Bus_ S1: //For Cache A bus_a <= 1'b1; Bus_ S2: //For Cache B bus_b <= 1'b1; Bus_ S3: //For memory bus_mem <= 1'b1; Bus_ S4: bus_a <= 1'b0; bus_b <= 1'b0; bus_mem <= 1'b0; Bus_ case module Memory Mapping Unit `timescale 1ns / 100ps module MMU ( clock, rst_1, cpu_cac_add_a, cpu_cac_add_b, cpu_cac_rd_a, cpu_cac_rd_b, cpu_cac_wrt_a, cpu_cac_wrt_b, cpu_cac_data_a, cpu_cac_data_b, cpu_mmu_add_a, cpu_mmu_add_b, cpu_mmu_rd_a, cpu_mmu_rd_b, cpu_mmu_wrt_a, cpu_mmu_wrt_b, 54

55 cpu_mmu_data_a, cpu_mmu_data_b ); //Inputs input clock, rst_1; input cpu_mmu_rd_a, cpu_mmu_rd_b; input cpu_mmu_wrt_a, cpu_mmu_wrt_b; input [4:0] cpu_mmu_data_a; input [4:0] cpu_mmu_data_b; input [6:0] cpu_mmu_add_a; input [6:0] cpu_mmu_add_b; //Outputs output cpu_cac_rd_a, cpu_cac_rd_b; input cpu_cac_wrt_a, cpu_cac_wrt_b; output [4:0] cpu_cac_data_a; output [4:0] cpu_cac_data_b; output [4:0] cpu_cac_add_a; output [4:0] cpu_cac_add_b; //Registers reg cpu_cac_rd_a, cpu_cac_rd_b; reg cpu_cac_wrt_a, cpu_cac_wrt_b; reg [4:0] cpu_cac_data_a; reg [4:0] cpu_cac_data_b; reg [4:0] cpu_cac_add_a; reg [4:0] cpu_cac_add_b; //Internals wire clock, rst_1, cpu_mmu_rd_a, cpu_mmu_rd_b; wire cpu_mmu_wrt_a, cpu_mmu_wrt_b; //Begin// clock or negedge rst_1) if (~rst_1) //initials cpu_cac_add_a <= 5'b0; cpu_cac_add_b <= 5'b0; cpu_cac_data_a <= 5'b0; cpu_cac_data_b <= 5'b0; cpu_cac_rd_a <= 1'b0; cpu_cac_rd_b <= 1'b0; cpu_cac_wrt_a <= 1'b0; cpu_cac_wrt_b <= 1'b0; else cpu_cac_wrt_a <= cpu_mmu_wrt_a; cpu_cac_wrt_b <= cpu_mmu_wrt_b; cpu_cac_rd_a <= cpu_mmu_rd_a; cpu_cac_rd_b <= cpu_mmu_rd_b; cpu_cac_data_a <= cpu_mmu_data_a; 55

56 cpu_cac_data_b <= cpu_mmu_data_b; cpu_cac_add_a <= cpu_mmu_add_a[4:0]; cpu_cac_add_b <= cpu_mmu_add_b[4:0]; module Test Beches: 1/2 `timescale 1ns / 100ps module Test_Bench_1/2 (); //Inputs reg clock, rst_l, cpu_mmu_rd_a, cpu_mmu_wrt_a; reg cpu_mmu_rd_b, cpu_mmu_wrt_b; reg [4:0] cpu_mmu_data_a; reg [4:0] cpu_mmu_data_b; reg [6:0] cpu_mmu_add_a; reg [6:0] cpu_mmu_add_b; // Cache - Cpu wire cache_cpu_hit_a,; wire cac_cpu_miss_a; wire [4:0] cac_cpu_data_a; wire cac_cpu_hit_b; wire cac_cpu_miss_b; wire [4:0] cac_cpu_data_b; // Cache Memory Bus Controller wire bus_req_a, priority_req_a; wire bus_req_b, priority_req_b, req; wire mem, enable_mem, bus_a; // Cache - Memory wire cac_mem_rd_a, cac_mem_wrt_a, data_avail_mem_a, priority_wrt_a; wire cac_mem_rd_b, cac_mem_wrt_b, priority_wrt_b, data_avail_mem_b; wire [4:0] cac_mem_add_a, cac_mem_data_a, mem_cac_data_a; wire [4:0] cac_mem_add_b, cac_mem_data_b, mem_cac_data_b; // MMU - Cache wire cpu_cac_rd_a, cpu_cac_wrt_a; wire cpu_cac_rd_b, cpu_cac_wrt_b; wire [4:0] cpu_cac_add_a, cpu_cac_data_a; wire [4:0] cpu_cac_add_b, cpu_cac_data_b; //Cache A - B wire snoop_a, snoop_b; wire invalidate_a, invalidate_b; wire [4:0] snoop_add_a; wire [4:0] snoop_add_b; // Instantiate Memory Memory (clock, rst_l, cac_mem_add_a, cac_mem_data_a, cac_mem_rd_a, cac_mem_wrt_a, mem_cac_data_a, cac_mem_add_b, cac_mem_data_b, cac_mem_rd_b, cac_mem_wrt_b, mem_cac_data_b, bus_req_mem, data_avail_mem_a, data_avail_mem_b, priority_wrt_a, priority_wrt_b, bus_mem); 56

57 //Instantiate MMU MMU (clock, rst_l, cpu_mmu_add_a, cpu_mmu_add_b, cpu_mmu_rd_a, cpu_mmu_rd_b, cpu_mmu_wrt_a, cpu_mmu_wrt_b, cpu_mmu_data_a, cpu_mmu_data_b, cpu_cac_add_a, cpu_cac_add_b, cpu_cac_rd_a, cpu_cac_rd_b, cpu_cac_wrt_a, cpu_cac_wrt_b, cpu_cac_data_a, cpu_cac_data_b); //Instantiate CacheA CacheA (clock, rst_l, cpu_cac_add_a, cpu_cac_rd_a, cpu_cac_wrt_a, cpu_cac_data_a, mem_cac_data_a, data_avail_mem_a, cac_cpu_hit_a, cac_cpu_miss_a, cac_cpu_data_a, cac_mem_add_a, cac_mem_data_a, cac_mem_rd_a, cac_mem_wrt_a, priority_wrt_a, bus_req_a, bus_a, priority_req_a, snoop_b, snoop_add_b, snoop_a, snoop_add_a, invalidate_a, invalidate_b); //Instantiate CacheB CacheB (clock, rst_l, cpu_cac_add_b, cpu_cac_rd_b, cpu_cac_wrt_b, cpu_cac_data_b, mem_cac_data_b, data_avail_mem_b, cac_cpu_hit_b, cac_cpu_miss_b, cac_cpu_data_b, cac_mem_add_b, cac_mem_data_b, cac_mem_rd_b, cac_mem_wrt_b, priority_wrt_b, bus_req_b, bus_b, priority_req_b, snoop_a, snoop_add_a, snoop_b, snoop_add_b, invalidate_a, invalidate_b); //Instantiate MemoryBusController MemoryBusController (clock, rst_l, bus_req_a, bus_req_b, req_mem, priority_req_a, priority_req_b, enable_mem, bus_a, bus_b); always #5 clock <= ~clock; // Start //initials clock <= 1'b0; rst_l <= 1'b1; cpu_mmu_rd_a <= 1'b0; cpu_mmu_rd_b <= 1'b0; cpu_mmu_wrt_a <= 1'b0; cpu_mmu_wrt_b <= 1'b0; cpu_mmu_add_a <= 7'b0; cpu_mmu_add_b <= 7'b0; cpu_mmu_data_a <= 5'b0; cpu_mmu_data_b <= 5'b0; # 10 rst_l # 10 rst_l <= 1'b0; <= 1'b1; # 10 cpu_mmu_rd_a <= 1'b1; cpu_mmu_add_a <= 7'b ; cpu_mmu_rd_b <= 1'b1; cpu_mmu_add_b <= 7'b ; # 10 cpu_mmu_rd_a <= 1'b0; cpu_mmu_rd_b <= 1'b0; # 100 # 10 cpu_mmu_rd_a <= 1'b1; cpu_mmu_add_a <= 7'b ; 57

58 cpu_mmu_rd_b <= 1'b1; cpu_mmu_add_b <= 7'b ; # 10 cpu_mmu_rd_a <= 1'b0; cpu_mmu_rd_b <= 1'b0; # 100 #10 cpu_mmu_rd_a <= 1'b1; cpu_mmu_add_a <= 7'b ; cpu_mmu_rd_b <= 1'b1; cpu_mmu_add_b <= 7'b ; # 10 cpu_mmu_rd_a <= 1'b0; cpu_mmu_rd_b <= 1'b0; module 3 # 10 cpu_mmu_wrt_a <= 1'b1; cpu_mmu_add_a <= 7'b ; cpu_mmu_data_a <= 5'b11011; cpu_mmu_wrt_b <= 1'b1; cpu_mmu_add_b <= 7'b ; cpu_mmu_data_b <= 5'b10001; # 10 cpu_mmu_wrt_a <= 1'b0; cpu_mmu_wrt_b <= 1'b0; # 150 #10 cpu_mmu_wrt_a <= 1'b1; cpu_mmu_add_a <= 7'b ; cpu_mmu_data_a <= 5'b00100; cpu_mmu_wrt_b <= 1'b1; cpu_mmu_add_b <= 7'b ; cpu_mmu_data_b <= 5'b01110; # 10 cpu_mmu_wrt_a <= 1'b0; cpu_mmu_wrt_b <= 1'b0; module 4 58

59 # 10 cpu_mmu_wr_a <= 1'b1; cpu_mmu_addr_a <= 7'b ; cpu_mmu_data_a <= 5'b11011; # 10 cpu_mmu_wr_a <= 1'b0; # 100 # 10 cpu_mmu_wr_a <= 1'b1; cpu_mmu_addr_a <= 7'b ; cpu_mmu_data_a <= 5'b00100; # 10 cpu_mmu_wr_a <= 1'b0; # 50 #10 cpu_mmu_rd_b <= 1'b1; cpu_mmu_addr_b <= 7'b ; # 10 cpu_mmu_rd_b <= 1'b0; #200 module 59

60 References 1) Computer Architecture A Quantitative Approach, John L. Hennessy & David A. Patterson 2) Computer Organization and Design, John L. Hennessy & David A. Patterson 3) Advanced Digital Design with the Verilog HDL, Michael D. Ciletti 60

The Cache Write Problem

The Cache Write Problem Cache Coherency A multiprocessor and a multicomputer each comprise a number of independent processors connected by a communications medium, either a bus or more advanced switching system, such as a crossbar