ECE 254A Advanced Computer Architecture: Supercomputers Fall 2006

Size: px
Start display at page:

Download "ECE 254A Advanced Computer Architecture: Supercomputers Fall 2006"

Transcription

1 ECE 254A Advanced Computer Architecture: Supercomputers Fall 2006 University of California, Santa Barbara Department of Electrical and Computer Engineering Project 2 Designing a Snoopy Cache Ali Umut IRTURK ECE Department & ECON Department Graduate Student 11/6/2006 1

2 1) OVERVIEW OF THE PROJECT 3 2) DISCOVERING THE INPUT AND OUTPUT PORTS 4 3) DETAILED INFORMATION ABOUT THE DESIGNED CACHES: 8 4) TEST BENCHES: 14 5) FIGURES 24 6) CODES: 34 2

3 1) Overview of the Project The aim in this project is to design a snoopy cache protocol which maintains coherence for multiple processors using Verilog. In my projecet, I designed 7 blocks to accomplish this cache protocol fully functional. The design modules are: 1) Cpu: There are two Cpus in my design. These design units basically requests read or write to the caches. 2) Cache: There are two Caches in my design. The caches are two-way associative. There are 8 entries in the cache, and each cache entry has 11 bits, includes data, tag, update, dirty and valid bits. This design subjects are considered in detail in the following sections. 3) Memory Mapping Unit: This unit is for converting the virtual addess to the physical addess. I designed the virtual addess line as 7 bits and the physical addess line is 5 bits. This is done basically by cutting the most significant two bits of the virtual addess. 4) Memory Bus Controller: Because we are using different modules which can access the bus at the same time, the memory bus controller is designed. 5) Memory: The memory has 32 entries, each entry has 10 bits. These design modules can seen in figures 1-6 and the next section which is named discovering the inputs and outputs gives very detailed information about the usage of the modules. The most important part of the design is using: 1) Two phase clocking: This gives us the advantage of using the two edges of the clock during the state changes. 3

4 2) Snooping and Invalidation: Snooping Protocols maintain coherence for multiple processors. To maintain the coherence requirement in snooping protocols, I used write invalidate protocol. In the third section, I gave very detailed information about these most important parts of the project. After implementing these modules, I wrote 4 different test benches to see my project is working properly. I designed test benches for read misses, write misses, snooping and invalidation. The detailed information is given in the fourth section of the project report. 2) Discovering the Input and Output ports I will consider every component one by one, and find these input and output ports. However, at this step I didn t specify the length of the inputs and outputs. a) Cpu: As I said in my design, I implemented two cpus, Cpu A, Cpu B and two caches, Cache A, Cache B. When the information is needed from the caches or the information is needed to write, the one of the cpu s accesses to the cache. Thus; When any need of information is considered; i) The Cpu must inform this situation by a read signal. ii) The Cpu must inform where the data is by addess bits. However at this step, we are using Memory Mapping Unit. When writing is considered iii) The Cpu must inform this situation by a write signal. iv) The Cpu must inform which data is need to be written by data bits. 4

5 v) The Cpu must inform where the data will be written by addess bits. (used in the any need of information process) However at this step, we are using Memory Mapping Unit. This shows that there must be 4 outputs from Cpus to the Caches (Cache inputs from Cpu). I designated them using cpu_cac_name_nameofthecpu or cpu_mmu_name_nameofthecpu. Basically the read signal and write signal can be accomplished by 1 bit. However the addess and data bits will be decided later. This module can be seen in Figure 1. b) Memory Mapping Unit and the Relationship between Cpus, MMU and Caches The aim of the usage of the Memory Mapping Unit is for converting the virtual addess to the physical addess. I designed the virtual addess line as 7 bits and the physical addess line is 5 bits. Basically converting the virtual addess line to physical addess line is done by cutting the most significant two bits from the virtual addess line. The addess bits from Cpu A or Cpu B comes to Memory Mapping Unit as an input. After converting the addess bits to physical addess, MMU ss the addess bits to Cache A or Cache B. This module can be seen in Figure 2. c) Memory: If a miss occurs in the Cache after Cpu s request. The cache must access to the memory, for retrieving data. Thus, memory needs an output to the Cache for transferring it to the cache: Output Ports: 5

6 i) The requested data s by data bits from Memory to the Cache. I designated this using mem_cac_data. ii) Because we are dealing with different caches and we have a Memory Bus Controller, there must be a bit indicator which shows the caches the desired data is available. I designated this output as data_avail_mema or B which is just 1 bit. Input Ports: Input ports for the Memory come from the Caches. As I mentioned before, if a miss occurs, the memory must be accessed for to retrieve the desired block or if the Cpu wants to write information to the Memory using Cache, there must be several outputs form Cache to the Memory. If a miss occurs in Cache i) This information must be given to Memory by sing a read bit. ii) The Cache must inform where the data is by address bits. If writing situation is considered iii) The Cache must inform this situation by a write bit. iv) The Cache must inform which data is need to be written by data bits. v) If a priority writing situation occurs, I designated a signal to indicate this situation which is just 1 bit, priority_wrt_a or B. This shows that there must be 5 outputs from Cache to the Memory (Memory inputs from Cache). They are designated as cac_mem_name_nameofthecache. Basically the read bit, write bit and priority write bit can be accomplished by 1 bit. This module can be seen in Figure 3. 6

7 d) Memory Bus Controller Because we are using different modules which can access the bus at the same time, the memory bus controller is required in this project. The Memory Bus Controller receives the request and gives the control of the bus to the requester block. The aim of the priority write bit is to designate the priority of the write back process. Inputs to the Memory Bus Controller: i) The request from cache A: bus_req_a ii) Priority request from Cache A: priority_ req_a iii) The request from cache B: bus_ req_b iv) Priority request from Cache B: priority_ req_b v) The request from memory: bus_ mem Output to the Caches and Memory: vi) The bit data shows the bus is given to the Cache A: bus_a vii) The bit data shows the bus is given to the Cache B: bus_b viii) The bit data shows the bus is given to the Memory: bus_mem This module can be seen in Figure 4. e) Cache: The caches are the other important parts of this design. There must be several outputs from Cache to the Cpu and Memory. And there are several inputs from the other design modules: Cpu, MMU, Memory and Memory Bus Controller. The relationship between Cache and Cpu As discussed in the Cpu part, there must be 4 inputs from the Cpu to the Cache, such as read, write, addess and data bits. And there must be another input from the MMU, as the addess bits which is physical addess. 7

8 Output Ports: If the Cpu gives read signal and s the addess of the data i) If the data requested by the processor appears in the Cache, this is called hit. First this information must be given to Cpu by sing a hit bit. And the found data must be sent back to Cpu, so Cache needs an output to the Cpu to s data bits. ii) If the data is not found in the Cache, the request is called a miss. The memory is then accessed to retrive the block containing the requested data. This information must be given to the Cpu by sing a miss bit. This shows that there must be 3 outputs from Cache to the Cpu (Cpu inputs from Cache). These are designated by cac_cpu_name_nameofthecache. Basically the hit signal and miss signal can be accomplished by 1 bit. As a result, by considering the above blocks, we can draw the cache block without considering snooping and invalidation which can be seen in Figure 5. At this point we designed every required module for the project. I combined these modules in Figure 7 which we can see the general picture. 3) Detailed information About the Designed Caches: Set Associative Cache Design, Snooping and Invalidation and Write Back Where can a block be placed in a cache? As we know we have three different for this question: 1) Direct Mapped Cache Design 2) Fully Associative Cache Design 3) Set Associative Cache Design 8

9 In the first project which is the design of a simple cache, I used direct mapped cache design. In this project, I used Two-Way Set Associative cache design to make implementation more realistic. In this kind of cache design, a block can be placed in a restricted set of places in the cache. Here a set is group of blocks in the cache. A block is first mapped onto a set, and then the block can be placed anywhere within that set. The set is chosen by bit selection; that is, (Block addess) MOD (Number of sets in the cache which is 2 in this design) For every data there are two blocks for storage at the same index. We can consider this situation like two pages on top of each other. This gives us a better understanding about concept. But this situation gives us another important question; Which block should be replaced on a Cache Miss? After a miss occurs, the cache controller must select a block to be replaced with the desired data. In our situation there are two possible blocks to replace the desired data. As we know there are three primary strategies employed for selecting which block to replace: 1) Random 2) Least-Recently Used 3) First in, First out In this project I used the second strategy, Least-Recently Used (LRU). In this approach, we are reducing the chance of throwing out information that will be needed soon. For achieve this stability, accesses to blocks must be recorded. I achieved this by using update bits in cache entries. So, relying on the past to predict future, the block replaced id the one that has been unused for the longest time. In my situation there are two pages means that there are two possible blocks to replace the desired. I always check the update bits to understand which is recently used, it must be 1. If I write a data to an index, I make the update bit 1 and it is important to make the update bit in the next page 0 for later. 9

10 Here comes another important question: What happens on a Write? As we know again, there are two basic options when writing to the cache: 1) Write Through 2) Write Back I used Write Through in the simple cache design which is easier to implement than write back. In that situation, the information was written to both the block in the cache and to the block in the memory. However, in this project there are different important concepts which I will discuss later in this project report. Thus, I used Write Back method in this project. In this method, the information is written only to the block in the cache. The modified cache block is written to memory only when it is replaced. When using Write Back method, a new feature must be introduced. Usage of dirty bits reduces the frequency of writing back blocks on replacement. This status bit indicates whether the block is dirty which means it is modified while in the cache or clean which means that it is not modified. So, if it is clean, the block is not written back on a miss, because identical information to the cache can be found in memory. This will be another bit in the cache entry. Thus, I used the advantage of write back method which gives us less usage of memory bandwidth. The cache entry in this project is in Figure. Figure - Cache Entry. The additions from the first project are the dirty bit and update bit which is described above. 10

11 After we decided the Cache Entry, we need to consider the question that how is a block found if it is the cache. Snooping Protocols: This part is the most important part of the project. Snooping Protocols maintain coherence for multiple processors. The actual name of these types of protocols is called cache coherence protocols. The key to implementing a cache coherence protocol is tracking the state of any sharings of a data block. Actually there are two classes of protocols: 1) Directory based 2) Snooping In my project, we considered the snooping protocols. In this protocol, every cache that has a copy of the data from a block of physical memory also has a copy of the sharings status of the block, and no centralized state is kept. The point is the caches are on a shared-memory bus, and all cache controllers snoop on the bus to determine whether or not they have a copy of a block that is requested on the bus. To maintain the coherence requirement in snooping protocols, there are two methods: 1) Write invalidate protocol 2) Write update or Write broadcast protocol In my design, I used write invalidate protocol as described in our lectures. In Write invalidate protocol, processor has exclusive access to a data item before it writes that item. The name is write invalidate because it invalidates other copies on a write. Exclusive access ensures that no other readable or writeable copies of an item exist when the write occurs because all other cached copies of the item are already invalidated. For Invalidation, the processor simply acquires bus access and broadcasts the address to be invalidated on the bus. All processors continuously snoop on the bus, watching the 11

12 addresses. The processors check whether the address on the bus is in their cache. If so, the corresponding data in the cache are invalidated. Cache State Transitions: We have three states which can be seen from the figure 9: Invalid, Shared, Exclusive. In Invalid State: There are two states of addressed cache block, read miss and write miss. Suppose a CPU requests a read, and read miss occurred. Then the read miss must be placed on the bus, and after the data stored in the cache, the state must be changed to Shared. Suppose a CPU requests a write, and write miss occurred. Then the write miss must be placed on the bus. And after the data is stored in the cache, the state must be changed to Exclusive. In Shared State: There are three states of addressed cache block, read miss, read hit and write miss. Suppose a Cpu requests a read, and read miss occurred. Then the read miss must be placed on the bus. After the data is stored in the cache, the state must stay in the Shared state. Suppose a Cpu requests a read, and read hit occurred. Then the state must stay in the Shared state. Suppose a Cpu requests a write, and write miss occurs. Then the write miss must be placed on the bus. After the data is stored in the cache, the state must be changed to Exclusive State. Exclusive State: There are four states of addressed cache block read miss, read hit, write hit and write miss. Suppose a Cpu requests a read, and read miss occurred. Then the read miss must be placed on the bus. After the data is stored in the cache, the state must be changed to Shared State. 12

13 Suppose a Cpu requests a read, and read hit occurred. Then the state must stay in the Exclusive State. Suppose a Cpu requests a write, and write miss occurs. Then the write miss must be placed on the bus. After the data is stored in the cache, the state must stay in Exclusive State. Suppose a Cpu requests a write and write hit occurred. Then the state must stay in Exclusive state. This different situation can be seen perfectly from the figure 9. 13

14 4) Test Benches: 1) Testing Read processes: The goal is to see that Memory Bus Controller is working properly In my design, the cache entries are filled by zeros. As we know I have two different Cpus. These Cpus are sent read request at the same time to different caches. The Cpu A ss request to Cache A and Cpu B ss request to Cache B. Because all the Caches are filled by zeros, read misses occur. In this situation each cache needs to access the memory to retrieve data. However, the simultaneous access is not possible. The Memory Bus Controller gives the control to one of the cache. The data is retrieved from the memory (grant is given to the other cache) and stored in the first page of the granted Cache. After this process, another read requests are sent with same addesses. And read hit must occur. The process: 1) The Cpu A requests the data in the addess and Cpu B requests the data in the addess ) Because both of the Caches are filled with zeros, no tags matched and read misses occurred. 3) Both of the caches s requests to the Memory Control Unit. Bus_bus_bus_req_A = 1 and Bus_bus_req_B = 1. 4) Firstly the bus is granted to the Cpu A. 5) Cac_mem_rd_A set to 1 which means that Cache will read the data in the desired addess. After this, the bus is granted to the Cache B. 6) Cac_memAdd_A set to Memory has the in the addess Thus, mem_cac_data_a is stored with the desired data ) The data is stored the cache A page 0 and sent to Cpu, cac_cpu_data_a is set to ) After the datas are stored to the both caches. The same read request occurs, and this results read hit in cache A. 14

15 Read hit occurred after the read misses. The requests are handled by the memory bus controller. The coordination between the caches, memory bus controller and memory is working properly. 2) Testing Read processes: The goal is to see that Two-way associative is working properly. This test bench is related to the Test bench 1. In Test Bench 1, the Cpus requested data in the same addess two times. What happens if the Cpus requested data from a different addess between previously declared addess. In the first requests, the cache misses must occur. And the memory bus must be granted to one of the Caches. After retrieving the data from the memory for both caches, the data must be written to the first pages of the caches. If the caches request another data from a different addess, the new data must be written to the second pages of the caches. At last, if the Cpu requests the data from the 15

16 firstly used addess. There must be a hit. This shows that the two-way associativity is working properly and stores the update bits. 1) The Cpu A requests the data in the addess and Cpu B request the data in the addess ) Because both of the Caches are filled with zeros, no tags matched and read misses occurred. 3) Both of the caches s requests to the Memory Control Unit. bus_bus_bus_req_a = 1 and bus_bus_bus_req_b = 1. 4) Firstly the bus is granted to the Cpu A. 5) Cac_mem_rd_A set to 1 which means that Cache will read the data in the desired addess. After this, the bus is granted to the Cache B. 6) Cac_memAdd_A set to Memory has the in the addess Thus, mem_cac_data_a is stored with the desired data ) The data is stored the cache A page 0 and sent to Cpu, cac_cpu_data_a is set to ) The Cpu A requests the data from a new addess and Cpu B request the data from a new addess ) No tags matched and read misses occurred. 10) Both of the caches s requests to the Memory Control Unit. B = 1 and Bus_bus_req_B = 1. 11) The bus is granted to the Cpu A again. 12) Cac_mem_rd_A set to 1 which means that Cache will read the data in the desired addess. After this, the bus is granted to the Cache B. 13) Cac_memAdd_A set to Memory has the in the addess Thus, mem_cac_data_a is stored with the desired data ) The data is stored the cache A page 1 and sent to Cpu, cac_cpu_data_a is set to ) Again, the Cpu A requests the data from the first used addess and Cpu B request the data from the first used addess ) This results read hit in cache A. 16

17 Read hit occurred after two read misses. Because of the two-way associative cache system, first data is stored in the page 1 and the second data is stored in page 2. And the request of the first data is accomplished successfully. 17

18 3) Testing Write processes: In this situation, both Cpus request writes to the each cache. However, the first write attempt results with a write miss. As a result the data is written to the first pages of the caches, write back to the memory is performed and the data is invalidated. In the second write attempt to the same addess results with a write hit. 1) The Cpu A requests a write in the addess with the data and Cpu B request a write in the addess with the data ) Write misses occurred. 18

19 3) Both of the caches s requests to the Memory Control Unit. bus_req_a = 1 and bus_req_b = 1. 4) Firstly the bus is granted to the Cpu A. 5) Cac_mem_wrt_A set to 1 and cac_mem_data_a is set to ) After the data is written the cache and the memory with the write back, Invalidation is performed. 7) And again, The Cpu A requests a write in the addess with the data and Cpu B request a write in the addess with the data ) Write hit occurs and no write back is performed. 19

20 4) Testing Snooping and Write-back: Any transition to the exclusive state which is required for a processor to write the block requires a write miss to be placed on the bus, causing all caches to make the block invalid. In addition, if some other cache had the block in exclusive state, that cache generates a write back, which supplies the block containing the desired address. 1) The Cpu A requests a write in the addess with the data

21 2) Write miss occurred. 3) Cache A ss a request to the memory. bus_req_a = 1. 4) The bus is granted to the Cpu A. 5) Cac_mem_wrt_A set to 1 and cac_mem_data_a is set to ) After the data is written the cache and the memory with the write back, Invalidation is performed. 7) And again, The Cpu A requests a write in the addess with the data which is a different data. 8) Write hit occurs and no write back is performed. However this makes the data dirty. 9) Then the Cache B reads the same address, snooping. 10) Cache A is required to priority write back the dirty data in memory. 21

22 22

23 23

24 5) Figures Figure 1: The Resulting CPU Module 25 Figure 2: Resulting Memory Mapping Module 26 Figure 3 : The Resulting Memory Module 27 Figure 4 : The Resulting Memory Bus Controller Module 28 Figure 5 : The Resulting Cache Module 29 Figure 6 : The relationship between Cache A and Cache B 30 Figure 7 : General Design of the Project 31 Figure 8 : The addess which is sent by Cpu matches to the both cache. Cpu Tag and Cpu Index constructs the addess which is sent by Cpu. 32 Figure 9 : Cache State Transitions 33 24

25 Figure 1: The Resulting CPU Module 25

26 Figure 2: Resulting Memory Mapping Module 26

27 Figure 3 : The Resulting Memory Module 27

28 Figure 4 : The Resulting Memory Bus Controller Module 28

29 Figure 5 : The Resulting Cache Module 29

30 Figure 6 : The relationship between Cache A and Cache B 30

31 Figure 7 : General Design of the Project 31

32 Figure 8 : The addess which is sent by Cpu matches to the both cache. Cpu Tag and Cpu Index constructs the addess which is sent by Cpu. 32

33 Figure 9 : Cache State Transitions 33

34 6) Codes: Cache_A `timescale 1ns/100ps // Module module Cache_A( //Global inputs and outputs clock, //Asynchronous active low reset rst_l, //Inputs from CPU cpu_cac_add_a, cpu_cac_rd_a, cpu_cac_wrt_a, cpu_cac_data_a, //Inputs from Main Memory mem_cac_data_a, data_avail_mema, //Inputs from mmu mmu_cac_add_a; //Inputs to the Memory Bus Controller bus_bus_bus_req_a, priority_bus_bus_req_a, //Snooping input ports snoop_b, snoop_add_b, invalidate_b //Outputs to CPU cac_cpu_hit_a, cac_cpu_miss_a, cac_cpu_data_a, //Outputs to Main Memory cac_data_avail_mem_add_a, cac_mem_data_a, cac_mem_rd_a, cac_mem_wrt_a, priority_wrt_a, //Outputs from Memory Bus Controller bus_a, //Snooping output ports snoop_a, snoop_add_a, invalidate_a, ); // Input Ports 34

35 //Global input clock; input rst_l; //CPU input [4:0]cpu_cac_data_A; input [4:0]cpu_cac_add_A; input cpu_cac_rd_a; input cpu_cac_wrt_a; // Memory input [4:0]mem_cac_data_A; input data_avail_mema; //Memory Bus Controller input bus_a; //Snooping input snoop_b; input [4:0] snoop_add_b; input invalidate_b; // Output Ports //CPU output [4:0] cac_cpu_data_a; output cac_cpu_hit_a; output cac_cpu_miss_a; // Memory output [4:0] cac_mem_data_a; output [4:0] cac_data_avail_mem_add_a; output cac_mem_rd_a; output cac_mem_wrt_a; output priority_wrt_a; //Memory Bus Controller output bus_bus_bus_req_a; output priority_bus_bus_req_a; //Snooping output snoop_a; output [4:0] snoop_add_a; output invalidation_a; // Registers //CPU reg [4:0] cac_cpu_data_a; reg cac_cpu_hit_a; reg cac_cpu_miss_a; // Memory reg [4:0] cac_mem_data_a; reg [4:0] cac_memadd_a; reg cac_mem_rd_a; reg cac_mem_wrt_a; reg priority_wrt_a; //Memory Bus Controller reg bus_bus_bus_req_a; reg priority_bus_bus_req_a; 35

36 //Snooping reg snoop_a; reg [4:0] snoop_add_a; reg snoop_page_1; reg snoop_page_2; reg invalidate_a; //Cache Buffer reg [10:0] buffer_1 [0:7]; reg [10:0] buffer_2 [0:7]; //Read/Write regs reg read_a; reg write_a; reg dirty; //Cache FSM reg [2:0] state; reg [2:0] next_s; reg [2:0] back_t_s; // Net //Cache-CPU Interface wire [10:0] cpu_buffer_1; //Buffer value at current index in page 0 wire [10:0] cpu_buffer_2; //Buffer value at current index in page 1 wire [2:0] cpu_index; //Index value from CPU addess wire [1:0] cpu_tag; //Tag value from CPU addess wire [1:0] cur_tag_1; wire [1:0] cur_tag_2; wire [4:0] add_mem; wire [4:0] cac_data; //Current Cache Data wire [4:0] mem_data; wire [1:0] snoop_tag; wire [2:0] snoop_index; wire [10:0] snoop_buffer_1; wire [10:0] snoop_buffer_2; wire [10:0] cac_data_1; wire [10:0] cac_data_2; wire valid_1; wire valid_2; wire dirty_1; wire dirty_2; wire update_1; wire update_2; //Integer integer i; //Parameters parameter S0 = 0; // Initial parameter S0 = 1; // Wait State 1 parameter S2 = 2; // Store State parameter S3 = 3; // Wait State 2 parameter S4 = 4; parameter S5 = 5; //priority_write state 36

37 //Assign //Cache-CPU assign cpu_tag = cpu_cac_add_a[1:0]; assign cur_tag_1 = cpu_buffer_1[6:5]; assign cur_tag_2 = cpu_buffer_2[6:5]; assign cpu_index = cpu_cac_add_a[4:2]; assign add_mem = cpu_cac_add_a[4:0]; assign valid_1 = cpu_buffer_1[10]; assign valid_2 = cpu_buffer_2[10]; assign dirty_1 = cpu_buffer_1[9]; assign dirty_2 = cpu_buffer_2[9]; assign update_1 = cpu_buffer_1[7]; assign update_2 = cpu_buffer_2[7]; assign cpu_buffer_1 = buffer_1[cpu_index]; assign cpu_buffer_2 = buffer_2[cpu_index]; //Cache- Memory assign cac_data_1 = buffer_1[cpu_index]; assign cac_data_2 = buffer_2[cpu_index]; assign mem_data = mem_cac_data_a[4:0]; assign snoop_tag = snoop_add_b[1:0]; assign snoop_index = snoop_add_b[4:2]; assign snoop_buffer_1 = buffer_1[snoop_index]; assign snoop_buffer_2 = buffer_2[snoop_index]; //Begin // Cache_A // // 2 phase clock clock) next_s <= state; clock or negedge rst_l) if(rst_l == 0) //Store the initials cac_mem_rd_a <= 1'b0; cac_mem_wrt_a <= 1'b0; cac_cpu_data_a <= 5'b0; cac_cpu_miss_a <= 1'b0; cac_cpu_hit_a <= 1'b0; cac_mem_data_a <= 5'b0; cac_memadd_a <= 5'b0; snoop_a <= 1'b0; invalidate_a <= 1'b0; bus_bus_bus_req_a <= 1'b0; for(i=0; i <= 7; i = i+1) //Store 0 s into both pages buffer_1[i] <= 11'b0; 37

38 buffer_2[i] <= 11'b0; else // If its not a hard reset case(next_s) // State S0 - Initial S0: Begin //store initials again cac_cpu_data_a <= 5'b0; cac_cpu_miss_a <= 1'b0; cac_cpu_hit_a <= 1'b0; cac_mem_data_a <= 5'b0; cac_memadd_a <= 5'b0; cac_mem_rd_a <= 1'b0; cac_mem_wrt_a <= 1'b0; priority_bus_bus_req_a <= 1'b0; snoop_a <= 1'b0; snoop_page_1 <= 1'b0; snoop_page_2 <= 1'b0; invalidate_a <= 1'b0; read_a <= 1'b0; write_a <= 1'b0; dirty <= 1'b0; priority_wrt_a <= 1'b0; bus_bus_bus_req_a <= 1'b0; / //If it is a read if(cpu_cac_rd_a == 1'b1) read_a <= 1'b1; if(cpu_tag == cur_tag_1) // For Page 1 //read miss in INVALID state if(valid_1 == 1'b0) $display ("Read Miss In Invalid State PAGE 1"); cac_cpu_miss_a <= 1'b1; bus_bus_bus_req_a <= 1'b1; //read hit in EXCLUSIVE state if((dirty_1 == 1'b1) & (valid_1 == 1'b1)) $display ("Read Hit In Exculsive State PAGE 1"); cac_cpu_hit_a <= 1'b1; cac_cpu_data_a <= cpu_buffer_1[4:0]; buffer_1[cpu_index] <= buffer_1[cpu_index] 11'b ; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b ; //read hit in SHARED state 38

39 if((dirty_1 == 1'b0) & (valid_1 == 1'b1)) $display ("Read Hit In Shared State PAGE 1"); cac_cpu_hit_a <= 1'b1; cac_cpu_data_a <= cpu_buffer_1[4:0]; buffer_1[cpu_index] <= buffer_1[cpu_index] 11'b ; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b ; if(cpu_tag == cur_tag_2) //Check Page 2 of the Cache if(valid_2 == 1'b0) $display ("Read Miss In Invalid State PAGE 2"); cac_cpu_miss_a <= 1'b1; bus_bus_bus_req_a <= 1'b1; //read hit in EXCLUSIVE state if((dirty_2 == 1'b1) & (valid_2 == 1'b1)) $display ("Read Hit In Exculsive State PAGE 2"); cac_cpu_hit_a <= 1'b1; cac_cpu_data_a <= cpu_buffer_2[4:0]; buffer_2[cpu_index] <= buffer_2[cpu_index] 11'b ; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b ; //read hit in SHARED state if((dirty_2 == 1'b0) & (valid_2 == 1'b1)) $display ("Read Hit In Shared State PAGE 2"); cac_cpu_hit_a <= 1'b1; cac_cpu_data_a <= cpu_buffer_2[4:0]; buffer_2[cpu_index] <= buffer_2[cpu_index] 11'b ; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b ; if((cpu_tag!= cur_tag_1) & (cpu_tag!= cur_tag_2)) $display ("Read Miss In Invalid State"); cac_cpu_miss_a <= 1'b1; bus_bus_bus_req_a <= 1'b1; //if it is a WRITE if(cpu_cac_wrt_a == 1'b1) write_a <= 1'b1; if(cpu_tag == cur_tag_1) //write miss in INVALID state if(valid_1 == 1'b0) $display ("Write Miss In Invalid State PAGE 1"); cac_cpu_miss_a <= 1'b1; 39

40 bus_bus_bus_req_a <= 1'b1; //write miss in EXCLUSIVE state else if((dirty_1 == 1'b1) & (valid_1 == 1'b1)) $display ("Write Hit In Exculsive State PAGE 1"); cac_cpu_hit_a <= 1'b1; buffer_1[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b ; //write hit in SHARED state else if((dirty_1 == 1'b0) & (valid_1 == 1'b1)) $display ("Write Hit In Shared State PAGE 1"); buffer_1[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b ; bus_bus_bus_req_a <= 1'b1; back_t_s <= S0; state <= S4; else if(cpu_tag == cur_tag_2) if(valid_2 == 1'b0) $display ("Write Miss In Invalid State PAGE 2"); cac_cpu_miss_a <= 1'b1; bus_bus_bus_req_a <= 1'b1; //write hit in EXCLUSIVE state else if((dirty_2 == 1'b1) & (valid_2 == 1'b1)) $display ("Write Hit In Exculsive State PAGE 2"); cac_cpu_hit_a <= 1'b1; buffer_2[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b ; //write hit in SHARED state else if((dirty_2 == 1'b0) & (valid_2 == 1'b1)) $display ("Write Hit In Shared State PAGE 2"); buffer_2[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b ; bus_bus_bus_req_a <= 1'b1; back_t_s <= S0; state <= S4; else if((cpu_tag!= cur_tag_1) & (cpu_tag!= cur_tag_2)) $display ("Write Miss In Invalid state"); cac_cpu_miss_a <= 1'b1; bus_bus_bus_req_a <= 1'b1; //if snooping 40

41 if(snoop_b == 1'b1) if((snoop_tag == snoop_buffer_1[6:5])&(snoop_buffer_1[10] == 1'b1)) if(snoop_buffer_1[9] == 1'b1) priority_bus_bus_req_a <= 1'b1; snoop_page_1 <= 1'b1; back_t_s <= S0; state <= S3; else else if((snoop_tag == snoop_buffer_2[6:5])&(snoop_buffer_2[10] == 1'b1)) if(snoop_buffer_2[9] == 1'b1) priority_bus_bus_req_a <= 1'b1; snoop_page_2 <= 1'b1; back_t_s <= S0; state <= S3; else else //if Invalidation if(invalidate_b == 1'b1) if((snoop_tag == snoop_buffer_1[6:5])&(snoop_buffer_1[10] == 1'b1)) buffer_1[snoop_index] <= (buffer_1[snoop_index] & 11'b ); buffer_2[snoop_index] <= (buffer_2[snoop_index] 11'b ); else if((snoop_tag == snoop_buffer_2[6:5])&(snoop_buffer_1[10] == 1'b1)) buffer_2[snoop_index] <= (buffer_2[snoop_index] & 11'b ); buffer_1[snoop_index] <= (buffer_1[snoop_index] 11'b ); else //State 0 //State S0 - Wait S0: 41

42 cac_cpu_miss_a <= 1'b0; priority_wrt_a <= 1'b0; $display ("Waiting in S0"); if(bus_a == 1'b1) cac_mem_rd_a <= 1'b1; cac_memadd_a <= add_mem; snoop_a <= 1'b1; snoop_add_a <= add_mem; bus_bus_bus_req_a <= 1'b0; state <= S2; //if invalidation else if(invalidate_b == 1'b1) if((snoop_tag == snoop_buffer_1[6:5])&(snoop_buffer_1[10] == 1'b1)) buffer_1[snoop_index] <= (buffer_1[snoop_index] & 11'b ); buffer_2[snoop_index] <= (buffer_2[snoop_index] 11'b ); if(snoop_add_b == cpu_cac_add_a) bus_bus_bus_req_a <= 1'b0; cac_cpu_miss_a <= 1'b1; else else if((snoop_tag == snoop_buffer_2[6:5])&(snoop_buffer_2[10] == 1'b1)) buffer_2[snoop_index] <= (buffer_2[snoop_index] & 11'b ); buffer_1[snoop_index] <= (buffer_1[snoop_index] 11'b ); if(snoop_add_b == cpu_cac_add_a) bus_bus_bus_req_a <= 1'b0; cac_cpu_miss_a <= 1'b1; else else else //S0 //State S2 42

43 S2: dirty <= 1'b0; snoop_a <= 1'b0; cac_mem_wrt_a <= 1'b0; cac_mem_rd_a <= 1'b0; snoop_page_1 <= 1'b0; snoop_page_2 <= 1'b0; priority_wrt_a <= 1'b0; if((data_avail_mema == 1'b1) (dirty == 1'b1)) if(update_1 == 1'b0) //Data needs to be written in page 1 if(dirty_1 == 1'b1) cac_mem_wrt_a <= 1'b1; cac_memadd_a <= add_mem; cac_mem_data_a <= cpu_cac_data_a; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b ; dirty <= 1'b1; state <= S2; else if(dirty_1 == 1'b0) buffer_1[cpu_index] <= {1'b1,1'b0,1'b0,1'b1,cpu_tag,mem_cac_data_A}; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b ; if((read_a == 1'b1) & (write_a == 1'b0)) else if((read_a == 1'b0) & (write_a == 1'b1)) buffer_1[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; cac_mem_wrt_a <= 1'b1; cac_memadd_a <= add_mem; cac_mem_data_a <= cpu_cac_data_a; invalidate_a <= 1'b1; snoop_add_a <= add_mem; else if(update_2 == 1'b0) if(dirty_2 == 1'b1) cac_mem_wrt_a <= 1'b1; cac_memadd_a <= add_mem; cac_mem_data_a <= cpu_cac_data_a; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b ; dirty <= 1'b1; state <= S2; else if(dirty_2 == 1'b0) buffer_2[cpu_index] <= {1'b1,1'b0,1'b0,1'b1,cpu_tag,mem_cac_data_A}; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b ; if((read_a == 1'b1) & (write_a == 1'b0)) 43

44 if((read_a == 1'b0) & (write_a == 1'b1)) buffer_2[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; cac_mem_wrt_a <= 1'b1; cac_memadd_a <= add_mem; cac_mem_data_a <= cpu_cac_data_a; invalidate_a <= 1'b1; snoop_add_a <= add_mem; //if snooping else if(snoop_b == 1'b1) if((snoop_tag == snoop_buffer_1[6:5]) & (snoop_buffer_1[10] == 1'b1)) if(snoop_buffer_1[9] == 1'b1) priority_bus_bus_req_a <= 1'b1; snoop_page_1 <= 1'b1; back_t_s <= S2; state <= S3; else state <= S2; else if((snoop_tag == snoop_buffer_2[6:5]) & (snoop_buffer_2[10] == 1'b1)) if(snoop_buffer_2[9] == 1'b1) priority_bus_bus_req_a <= 1'b1; snoop_page_2 <= 1'b1; back_t_s <= S2; state <= S3; else state <= S2; else state <= S2; //if invalidation else if(invalidate_b == 1'b1) if((snoop_tag == snoop_buffer_1[6:5])& (snoop_buffer_1[10] == 1'b1)) 44

45 buffer_1[snoop_index] <= buffer_1[snoop_index] & 11'b ; buffer_2[snoop_index] <= buffer_2[snoop_index] 11'b ; state <= S2; else if((snoop_tag == snoop_buffer_2[6:5]) & (snoop_buffer_2[10] == 1'b1)) buffer_2[snoop_index] <= buffer_2[snoop_index] & 11'b ; buffer_1[snoop_index] <= buffer_1[snoop_index] 11'b ; state <= S2; else state <= S2; // S2 // State S3 S3: if(bus_a == 1'b1) priority_bus_bus_req_a <= 1'b0; if((snoop_page_1 == 1'b1)&(snoop_page_2 == 1'b0)) priority_wrt_a <= 1'b1; cac_memadd_a <= snoop_add_b; cac_mem_data_a <= snoop_buffer_1[4:0]; buffer_1[snoop_index] <= buffer_1[snoop_index] & 11'b ; state <= S5; else if((snoop_page_1 == 1'b0)&(snoop_page_2 == 1'b1)) priority_wrt_a <= 1'b1; cac_memadd_a <= snoop_add_b; cac_mem_data_a <= snoop_buffer_2[4:0]; buffer_2[snoop_index] <= buffer_2[snoop_index] & 11'b ; state <= S5; else state <= S3; //S3 //State S4 S4: if(bus_a == 1'b1) bus_bus_bus_req_a <= 1'b0; cac_cpu_hit_a <= 1'b1; invalidate_a <= 1'b1; snoop_add_a <= add_mem; cac_mem_wrt_a <= 1'b1; 45

46 cac_memadd_a <= add_mem; cac_mem_data_a <= cpu_cac_data_a; state <= back_t_s; // S4 //State S5 S5: state <= back_t_s; case module //Cache_A 46

47 Memory Module `timescale 1ns/100ps module Memory (clock, rst_1, cac_data_avail_mem_add_a, cac_mem_data_a, cac_mem_rd_a, cac_mem_wrt_a, cac_data_avail_mem_add_b, cac_mem_data_b, cac_mem_rd_b, cac_mem_wrt_b, mem_cac_data_b, mem_cac_data_a, bus_bus_mem, priority_wrt_a, priority_wrt_b, mema, memb, bus_mem ); //Inputs input clock, rst_1; input cac_mem_rd_a,cac_mem_wrt_a,cac_mem_rd_b,cac_mem_wrt_b; input [4:0] cac_data_avail_mem_add_a; input [4:0] cac_data_avail_mem_add_b; input [4:0] cac_mem_data_a; input [4:0] cac_mem_data_b; input priority_wrt_a; input priority_wrt_b; input bus_mem; //Outputs output [4:0] mem_cac_data_a; output [4:0] mem_cac_data_b; output bus_bus_mem; output mema; output memb; //Registers reg [0:9] memarray [31:0]; reg [4:0] mem_cac_data_a; reg [4:0] mem_cac_data_b; reg [4:0] mem_cache_data; reg mema; reg memb; reg bus_bus_mem; reg ready_bit_a; reg ready_bit_b; reg nexta; reg nextb reg [2:0] state; reg [4:0] add; 47

48 reg [4:0] next_add; //Internals parameter S0 = 0; parameter S0 = 1; parameter S2 = 2; parameter S3 = 3; parameter S4 = 4; //Memory clock or negedge rst_1) if (~rst_1) memarray[31] = 10'b ; memarray[30] = 10'b ; memarray[29] = 10'b ; memarray[28] = 10'b ; memarray[27] = 10'b ; memarray[26] = 10'b ; memarray[25] = 10'b ; memarray[24] = 10'b ; memarray[23] = 10'b ; memarray[22] = 10'b ; memarray[21] = 10'b ; memarray[20] = 10'b ; memarray[19] = 10'b ; memarray[18] = 10'b ; memarray[17] = 10'b ; memarray[16] = 10'b ; memarray[15] = 10'b ; memarray[14] = 10'b ; memarray[13] = 10'b ; memarray[12] = 10'b ; memarray[11] = 10'b ; memarray[10] = 10'b ; memarray[9] = 10'b ; memarray[8] = 10'b ; memarray[7] = 10'b ; memarray[6] = 10'b ; memarray[5] = 10'b ; memarray[4] = 10'b ; memarray[3] = 10'b ; memarray[2] = 10'b ; memarray[1] = 10'b ; memarray[0] = 10'b ; mem_cac_data_a <= 5'b0; mem_cac_data_b <= 5'b0; mema <= 1'b0; memb <= 1'b0; nexta <= 1'b0; nextb <= 1'b0; add <= 5'b0; bus_bus_mem <= 1'b0; else 48

49 case(state) S0: mema <= 1'b0; memb <= 1'b0; ready_bit_a <= 1'b0; ready_bit_b <= 1'b0; bus_bus_mem <= 1'b0 add <= 5'b0; ; // IF WRITE if (priority_wrt_a == 1'b1) memarray[cac_data_avail_mem_add_a] <= {cac_data_avail_mem_add_a, cac_mem_data_a}; else if (priority_wrt_b == 1'b1) memarray[cac_data_avail_mem_add_b] <= {cac_data_avail_mem_add_b, cac_mem_data_b}; else if ((cac_mem_wrt_a == 1'b1) & (cac_mem_rd_a == 1'b0)) memarray[cac_data_avail_mem_add_a] <= {cac_data_avail_mem_add_a, cac_mem_data_a}; else if ((cac_mem_wrt_b == 1'b1) & (cac_mem_rd_b == 1'b0)) memarray[cac_data_avail_mem_add_b] <= {cac_data_avail_mem_add_b, cac_mem_data_b}; // FINISH WRITE else if ((cac_mem_wrt_a == 1'b0) & ((cac_mem_rd_a == 1'b1) (nexta == 1'b1))) if ((cac_mem_rd_a == 1'b1)&(nextA == 1'b0)) add <= cac_data_avail_mem_add_a; ready_bit_a <= cac_mem_rd_a; else if ((cac_mem_rd_a == 1'b0)&(nextA == 1'b1)) add <= next_add; ready_bit_a <= 1'b1; nexta <= 1'b0; else if ((cac_mem_wrt_b == 1'b0) & ((cac_mem_rd_b == 1'b1) (nextb == 1'b1))) 49

50 if ((cac_mem_rd_b == 1'b1)&(nextB == 1'b0)) add <= cac_data_avail_mem_add_b; ready_bit_b <= cac_mem_rd_b; else if ((cac_mem_rd_b == 1'b0)&(nextB == 1'b1)) add <= next_add; ready_bit_b <= 1'b1; nextb <= 1'b0; else //S0 S0: bus_bus_mem <= 1'b1; if(priority_wrt_a == 1'b1) memarray[cac_data_avail_mem_add_a] <= {cac_data_avail_mem_add_a, cac_mem_data_a}; state <= S2; else if (priority_wrt_b == 1'b1) memarray[cac_data_avail_mem_add_b] <= {cac_data_avail_mem_add_b, cac_mem_data_b}; state <= S2; else if(cac_mem_wrt_a == 1'b1) memarray[cac_data_avail_mem_add_a] <= {cac_data_avail_mem_add_a, cac_mem_data_a}; state <= S2; else if (cac_mem_wrt_b == 1'b1) memarray[cac_data_avail_mem_add_b] <= {cac_data_avail_mem_add_b, cac_mem_data_b}; state <= S2; else state <= S2; //S0 S2: if(priority_wrt_a == 1'b1) 50

51 case memarray[cac_data_avail_mem_add_a] <= {cac_data_avail_mem_add_a, cac_mem_data_a}; state <= S2; else if (priority_wrt_b == 1'b1) memarray[cac_data_avail_mem_add_b] <= {cac_data_avail_mem_add_b, cac_mem_data_a}; state <= S2; else if (cac_mem_rd_a == 1'b1) nexta <= 1'b1; next_add <= cac_data_avail_mem_add_a; state <= S2; else if (cac_mem_rd_b == 1'b1) nextb <= 1'b1; next_add <= cac_data_avail_mem_add_b; state <= S2; else if (cac_mem_wrt_a == 1'b1) memarray[cac_data_avail_mem_add_a] <= {cac_data_avail_mem_add_a, cac_mem_data_a}; state <= S2; else if (cac_mem_wrt_b == 1'b1) memarray[cac_data_avail_mem_add_b] <= {cac_data_avail_mem_add_b, cac_mem_data_b}; state <= S2; else if (bus_mem == 1'b1) bus_bus_mem <= 1'b0; if ((ready_bit_a == 1'b1)&(ready_bit_B == 1'b0)) mem_cac_data_a <= memarray[add]; mema <= 1'b1; if ((ready_bit_a == 1'b0)&(ready_bit_B == 1'b1)) mem_cac_data_b <= memarray[add]; memb <= 1'b1; else state <= S2; //S2 51

52 module Memory Bus Controller `timescale 1ns/100ps module MemoryBusController( clock, rst_1, bus_bus_req_a, bus_bus_req_b, bus_mem, bus_mem, bus_a, bus_b, priority_bus_req_a, priority_bus_req_b ); // Inputs input clock; input rst_1; input bus_bus_req_a; input bus_bus_req_b; input priority_bus_req_a; input priority_bus_req_b; input bus_mem; //Outputs output bus_a; output bus_b; output bus_mem; //Registers reg bus_a; reg bus_b; reg bus_mem; reg[2:0] bus_state; //Internals wire clock; wire rst_1; wire bus_bus_req_a; wire bus_bus_req_b; wire priority_bus_req_a; wire priority_bus_req_b; wire bus_mem; //Parameters parameter S0 = 0; //Initial parameter S1 = 1; //Granting the bus to the Cache A parameter S2 = 2; //Granting the bus to the Cache B parameter S3 = 3; //Granting the bus to the memory parameter S4 = 4; //Wait 52

53 // Module clock or negedge rst_1) if(rst_1 == 0) //Initials Bus_ bus_a <= 1'b0; bus_b <= 1'b0; bus_mem <= 1'b0; else case(bus_state) S0: //Initial bus_a <= 1'b0; bus_b <= 1'b0; bus_mem <= 1'b0; if (priority_bus_req_a == 1'b1) //Priority Cache A Bus_state <= S1; else if (priority_bus_req_b == 1'b1) //Priority Cache B Bus_state <= S2; else if (bus_bus_req_a == 1'b0 & bus_bus_req_b ==1'b0 & bus_mem ==1'b1) //Memory Bus_state <= S3; else if (bus_bus_req_a ==1'b1 & bus_bus_req_b == 1'b0 & bus_mem == 1'b1) Bus_state <= S3; else if (bus_bus_req_a == 1'b0 & bus_bus_req_b == 1'b1 & bus_mem == 1'b1) Bus_state <= S3; else if (bus_bus_req_a == 1'b1 & bus_bus_req_b == 1'b1 & bus_mem == 1'b1) Bus_state <= S3; else if (bus_bus_req_a == 1'b1 & bus_bus_req_b == 1'b0 & bus_mem == 1'b0) Bus_state <= S1; else if (bus_bus_req_a == 1'b0 & bus_bus_req_b == 1'b1 & bus_mem == 1'b0) Bus_state <= S2; 53

54 else if (bus_bus_req_a == 1'b1 & bus_bus_req_b == 1'b1 & bus_mem == 1'b0) Bus_state <= S1; else Bus_ S1: //For Cache A bus_a <= 1'b1; Bus_ S2: //For Cache B bus_b <= 1'b1; Bus_ S3: //For memory bus_mem <= 1'b1; Bus_ S4: bus_a <= 1'b0; bus_b <= 1'b0; bus_mem <= 1'b0; Bus_ case module Memory Mapping Unit `timescale 1ns / 100ps module MMU ( clock, rst_1, cpu_cac_add_a, cpu_cac_add_b, cpu_cac_rd_a, cpu_cac_rd_b, cpu_cac_wrt_a, cpu_cac_wrt_b, cpu_cac_data_a, cpu_cac_data_b, cpu_mmu_add_a, cpu_mmu_add_b, cpu_mmu_rd_a, cpu_mmu_rd_b, cpu_mmu_wrt_a, cpu_mmu_wrt_b, 54

55 cpu_mmu_data_a, cpu_mmu_data_b ); //Inputs input clock, rst_1; input cpu_mmu_rd_a, cpu_mmu_rd_b; input cpu_mmu_wrt_a, cpu_mmu_wrt_b; input [4:0] cpu_mmu_data_a; input [4:0] cpu_mmu_data_b; input [6:0] cpu_mmu_add_a; input [6:0] cpu_mmu_add_b; //Outputs output cpu_cac_rd_a, cpu_cac_rd_b; input cpu_cac_wrt_a, cpu_cac_wrt_b; output [4:0] cpu_cac_data_a; output [4:0] cpu_cac_data_b; output [4:0] cpu_cac_add_a; output [4:0] cpu_cac_add_b; //Registers reg cpu_cac_rd_a, cpu_cac_rd_b; reg cpu_cac_wrt_a, cpu_cac_wrt_b; reg [4:0] cpu_cac_data_a; reg [4:0] cpu_cac_data_b; reg [4:0] cpu_cac_add_a; reg [4:0] cpu_cac_add_b; //Internals wire clock, rst_1, cpu_mmu_rd_a, cpu_mmu_rd_b; wire cpu_mmu_wrt_a, cpu_mmu_wrt_b; //Begin// clock or negedge rst_1) if (~rst_1) //initials cpu_cac_add_a <= 5'b0; cpu_cac_add_b <= 5'b0; cpu_cac_data_a <= 5'b0; cpu_cac_data_b <= 5'b0; cpu_cac_rd_a <= 1'b0; cpu_cac_rd_b <= 1'b0; cpu_cac_wrt_a <= 1'b0; cpu_cac_wrt_b <= 1'b0; else cpu_cac_wrt_a <= cpu_mmu_wrt_a; cpu_cac_wrt_b <= cpu_mmu_wrt_b; cpu_cac_rd_a <= cpu_mmu_rd_a; cpu_cac_rd_b <= cpu_mmu_rd_b; cpu_cac_data_a <= cpu_mmu_data_a; 55

56 cpu_cac_data_b <= cpu_mmu_data_b; cpu_cac_add_a <= cpu_mmu_add_a[4:0]; cpu_cac_add_b <= cpu_mmu_add_b[4:0]; module Test Beches: 1/2 `timescale 1ns / 100ps module Test_Bench_1/2 (); //Inputs reg clock, rst_l, cpu_mmu_rd_a, cpu_mmu_wrt_a; reg cpu_mmu_rd_b, cpu_mmu_wrt_b; reg [4:0] cpu_mmu_data_a; reg [4:0] cpu_mmu_data_b; reg [6:0] cpu_mmu_add_a; reg [6:0] cpu_mmu_add_b; // Cache - Cpu wire cache_cpu_hit_a,; wire cac_cpu_miss_a; wire [4:0] cac_cpu_data_a; wire cac_cpu_hit_b; wire cac_cpu_miss_b; wire [4:0] cac_cpu_data_b; // Cache Memory Bus Controller wire bus_req_a, priority_req_a; wire bus_req_b, priority_req_b, req; wire mem, enable_mem, bus_a; // Cache - Memory wire cac_mem_rd_a, cac_mem_wrt_a, data_avail_mem_a, priority_wrt_a; wire cac_mem_rd_b, cac_mem_wrt_b, priority_wrt_b, data_avail_mem_b; wire [4:0] cac_mem_add_a, cac_mem_data_a, mem_cac_data_a; wire [4:0] cac_mem_add_b, cac_mem_data_b, mem_cac_data_b; // MMU - Cache wire cpu_cac_rd_a, cpu_cac_wrt_a; wire cpu_cac_rd_b, cpu_cac_wrt_b; wire [4:0] cpu_cac_add_a, cpu_cac_data_a; wire [4:0] cpu_cac_add_b, cpu_cac_data_b; //Cache A - B wire snoop_a, snoop_b; wire invalidate_a, invalidate_b; wire [4:0] snoop_add_a; wire [4:0] snoop_add_b; // Instantiate Memory Memory (clock, rst_l, cac_mem_add_a, cac_mem_data_a, cac_mem_rd_a, cac_mem_wrt_a, mem_cac_data_a, cac_mem_add_b, cac_mem_data_b, cac_mem_rd_b, cac_mem_wrt_b, mem_cac_data_b, bus_req_mem, data_avail_mem_a, data_avail_mem_b, priority_wrt_a, priority_wrt_b, bus_mem); 56

57 //Instantiate MMU MMU (clock, rst_l, cpu_mmu_add_a, cpu_mmu_add_b, cpu_mmu_rd_a, cpu_mmu_rd_b, cpu_mmu_wrt_a, cpu_mmu_wrt_b, cpu_mmu_data_a, cpu_mmu_data_b, cpu_cac_add_a, cpu_cac_add_b, cpu_cac_rd_a, cpu_cac_rd_b, cpu_cac_wrt_a, cpu_cac_wrt_b, cpu_cac_data_a, cpu_cac_data_b); //Instantiate CacheA CacheA (clock, rst_l, cpu_cac_add_a, cpu_cac_rd_a, cpu_cac_wrt_a, cpu_cac_data_a, mem_cac_data_a, data_avail_mem_a, cac_cpu_hit_a, cac_cpu_miss_a, cac_cpu_data_a, cac_mem_add_a, cac_mem_data_a, cac_mem_rd_a, cac_mem_wrt_a, priority_wrt_a, bus_req_a, bus_a, priority_req_a, snoop_b, snoop_add_b, snoop_a, snoop_add_a, invalidate_a, invalidate_b); //Instantiate CacheB CacheB (clock, rst_l, cpu_cac_add_b, cpu_cac_rd_b, cpu_cac_wrt_b, cpu_cac_data_b, mem_cac_data_b, data_avail_mem_b, cac_cpu_hit_b, cac_cpu_miss_b, cac_cpu_data_b, cac_mem_add_b, cac_mem_data_b, cac_mem_rd_b, cac_mem_wrt_b, priority_wrt_b, bus_req_b, bus_b, priority_req_b, snoop_a, snoop_add_a, snoop_b, snoop_add_b, invalidate_a, invalidate_b); //Instantiate MemoryBusController MemoryBusController (clock, rst_l, bus_req_a, bus_req_b, req_mem, priority_req_a, priority_req_b, enable_mem, bus_a, bus_b); always #5 clock <= ~clock; // Start //initials clock <= 1'b0; rst_l <= 1'b1; cpu_mmu_rd_a <= 1'b0; cpu_mmu_rd_b <= 1'b0; cpu_mmu_wrt_a <= 1'b0; cpu_mmu_wrt_b <= 1'b0; cpu_mmu_add_a <= 7'b0; cpu_mmu_add_b <= 7'b0; cpu_mmu_data_a <= 5'b0; cpu_mmu_data_b <= 5'b0; # 10 rst_l # 10 rst_l <= 1'b0; <= 1'b1; # 10 cpu_mmu_rd_a <= 1'b1; cpu_mmu_add_a <= 7'b ; cpu_mmu_rd_b <= 1'b1; cpu_mmu_add_b <= 7'b ; # 10 cpu_mmu_rd_a <= 1'b0; cpu_mmu_rd_b <= 1'b0; # 100 # 10 cpu_mmu_rd_a <= 1'b1; cpu_mmu_add_a <= 7'b ; 57

58 cpu_mmu_rd_b <= 1'b1; cpu_mmu_add_b <= 7'b ; # 10 cpu_mmu_rd_a <= 1'b0; cpu_mmu_rd_b <= 1'b0; # 100 #10 cpu_mmu_rd_a <= 1'b1; cpu_mmu_add_a <= 7'b ; cpu_mmu_rd_b <= 1'b1; cpu_mmu_add_b <= 7'b ; # 10 cpu_mmu_rd_a <= 1'b0; cpu_mmu_rd_b <= 1'b0; module 3 # 10 cpu_mmu_wrt_a <= 1'b1; cpu_mmu_add_a <= 7'b ; cpu_mmu_data_a <= 5'b11011; cpu_mmu_wrt_b <= 1'b1; cpu_mmu_add_b <= 7'b ; cpu_mmu_data_b <= 5'b10001; # 10 cpu_mmu_wrt_a <= 1'b0; cpu_mmu_wrt_b <= 1'b0; # 150 #10 cpu_mmu_wrt_a <= 1'b1; cpu_mmu_add_a <= 7'b ; cpu_mmu_data_a <= 5'b00100; cpu_mmu_wrt_b <= 1'b1; cpu_mmu_add_b <= 7'b ; cpu_mmu_data_b <= 5'b01110; # 10 cpu_mmu_wrt_a <= 1'b0; cpu_mmu_wrt_b <= 1'b0; module 4 58

59 # 10 cpu_mmu_wr_a <= 1'b1; cpu_mmu_addr_a <= 7'b ; cpu_mmu_data_a <= 5'b11011; # 10 cpu_mmu_wr_a <= 1'b0; # 100 # 10 cpu_mmu_wr_a <= 1'b1; cpu_mmu_addr_a <= 7'b ; cpu_mmu_data_a <= 5'b00100; # 10 cpu_mmu_wr_a <= 1'b0; # 50 #10 cpu_mmu_rd_b <= 1'b1; cpu_mmu_addr_b <= 7'b ; # 10 cpu_mmu_rd_b <= 1'b0; #200 module 59

60 References 1) Computer Architecture A Quantitative Approach, John L. Hennessy & David A. Patterson 2) Computer Organization and Design, John L. Hennessy & David A. Patterson 3) Advanced Digital Design with the Verilog HDL, Michael D. Ciletti 60

The Cache Write Problem

The Cache Write Problem Cache Coherency A multiprocessor and a multicomputer each comprise a number of independent processors connected by a communications medium, either a bus or more advanced switching system, such as a crossbar

More information

Lecture 24: Board Notes: Cache Coherency

Lecture 24: Board Notes: Cache Coherency Lecture 24: Board Notes: Cache Coherency Part A: What makes a memory system coherent? Generally, 3 qualities that must be preserved (SUGGESTIONS?) (1) Preserve program order: - A read of A by P 1 will

More information

Shared memory. Caches, Cache coherence and Memory consistency models. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16

Shared memory. Caches, Cache coherence and Memory consistency models. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16 Shared memory Caches, Cache coherence and Memory consistency models Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Shared memory Caches, Cache

More information

Homework 6. BTW, This is your last homework. Assigned today, Tuesday, April 10 Due time: 11:59PM on Monday, April 23. CSCI 402: Computer Architectures

Homework 6. BTW, This is your last homework. Assigned today, Tuesday, April 10 Due time: 11:59PM on Monday, April 23. CSCI 402: Computer Architectures Homework 6 BTW, This is your last homework 5.1.1-5.1.3 5.2.1-5.2.2 5.3.1-5.3.5 5.4.1-5.4.2 5.6.1-5.6.5 5.12.1 Assigned today, Tuesday, April 10 Due time: 11:59PM on Monday, April 23 1 CSCI 402: Computer

More information

Limitations of parallel processing

Limitations of parallel processing Your professor du jour: Steve Gribble gribble@cs.washington.edu 323B Sieg Hall all material in this lecture in Henessey and Patterson, Chapter 8 635-640 645, 646 654-665 11/8/00 CSE 471 Multiprocessors

More information

ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence

ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico

More information

Cache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014

Cache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014 Cache Coherence Introduction to High Performance Computing Systems (CS1645) Esteban Meneses Spring, 2014 Supercomputer Galore Starting around 1983, the number of companies building supercomputers exploded:

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Last time: Write-Back Alternative: On data-write hit, just

More information

Suggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!

Suggested Readings! What makes a memory system coherent?! Lecture 27 Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality! 1! 2! Suggested Readings! Readings!! H&P: Chapter 5.8! Could also look at material on CD referenced on p. 538 of your text! Lecture 27" Cache Coherency! 3! Processor components! Multicore processors and

More information

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Computer Architecture Memory hierarchies and caches

Computer Architecture Memory hierarchies and caches Computer Architecture Memory hierarchies and caches S Coudert and R Pacalet January 23, 2019 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches

More information

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L5- Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM

More information

Midterm Exam Thursday, October 24, :00--2:15PM (75 minutes)

Midterm Exam Thursday, October 24, :00--2:15PM (75 minutes) Last (family) name: Answer Key First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE 551 Digital System Design and Synthesis Midterm

More information

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an

More information

EPC6055 Digital Integrated Circuits EXAM 1 Fall Semester 2013

EPC6055 Digital Integrated Circuits EXAM 1 Fall Semester 2013 EPC6055 Digital Integrated Circuits EXAM 1 Fall Semester 2013 Print Here Student ID Signature This is a closed book exam. The exam is to be completed in one-hundred ten (110) minutes. Don t use scratch

More information

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L25-1 Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion

More information

Lab 7 (All Sections) Prelab: Introduction to Verilog

Lab 7 (All Sections) Prelab: Introduction to Verilog Lab 7 (All Sections) Prelab: Introduction to Verilog Name: Sign the following statement: On my honor, as an Aggie, I have neither given nor received unauthorized aid on this academic work 1 Objective The

More information

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017 CS 433 Homework 5 Assigned on 11/7/2017 Due in class on 11/30/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

EECS150 - Digital Design Lecture 7 - Computer Aided Design (CAD) - Part II (Logic Simulation) Finite State Machine Review

EECS150 - Digital Design Lecture 7 - Computer Aided Design (CAD) - Part II (Logic Simulation) Finite State Machine Review EECS150 - Digital Design Lecture 7 - Computer Aided Design (CAD) - Part II (Logic Simulation) Feb 9, 2010 John Wawrzynek Spring 2010 EECS150 - Lec7-CAD2 Page 1 Finite State Machine Review State Transition

More information

EECS150 - Digital Design Lecture 6 - Logic Simulation

EECS150 - Digital Design Lecture 6 - Logic Simulation EECS150 - Digital Design Lecture 6 - Logic Simulation Sep. 17, 013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof. John Wawrzynek)

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

Multiprocessor Cache Coherency. What is Cache Coherence?

Multiprocessor Cache Coherency. What is Cache Coherence? Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P

More information

Page 1. Cache Coherence

Page 1. Cache Coherence Page 1 Cache Coherence 1 Page 2 Memory Consistency in SMPs CPU-1 CPU-2 A 100 cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 19: Verilog and Processor Performance Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Verilog Basics Hardware description language

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

Chapter Seven. Idea: create powerful computers by connecting many smaller ones Chapter Seven Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) vector processing may be coming back bad news:

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 24: Cache Performance Analysis Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Associative caches How do we

More information

Lecture 24: Thread Level Parallelism -- Distributed Shared Memory and Directory-based Coherence Protocol

Lecture 24: Thread Level Parallelism -- Distributed Shared Memory and Directory-based Coherence Protocol Lecture 24: Thread Level Parallelism -- Distributed Shared Memory and Directory-based Coherence Protocol CSE 564 Computer Architecture Fall 2016 Department of Computer Science and Engineering Yonghong

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

Hardware Description Language (HDL)

Hardware Description Language (HDL) Hardware Description Language (HDL) What is the need for Hardware Description Language? Model, Represent, And Simulate Digital Hardware Hardware Concurrency Parallel Activity Flow Semantics for Signal

More information

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017 CS433 Homework 6 Assigned on 11/28/2017 Due in class on 12/12/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

EE178 Lecture Verilog FSM Examples. Eric Crabill SJSU / Xilinx Fall 2007

EE178 Lecture Verilog FSM Examples. Eric Crabill SJSU / Xilinx Fall 2007 EE178 Lecture Verilog FSM Examples Eric Crabill SJSU / Xilinx Fall 2007 In Real-time Object-oriented Modeling, Bran Selic and Garth Gullekson view a state machine as: A set of input events A set of output

More information

Lab 7 (All Sections) Prelab: Verilog Review and ALU Datapath and Control

Lab 7 (All Sections) Prelab: Verilog Review and ALU Datapath and Control Lab 7 (All Sections) Prelab: Verilog Review and ALU Datapath and Control Name: Sign the following statement: On my honor, as an Aggie, I have neither given nor received unauthorized aid on this academic

More information

6 th Lecture :: The Cache - Part Three

6 th Lecture :: The Cache - Part Three Dr. Michael Manzke :: CS7031 :: 6 th Lecture :: The Cache - Part Three :: October 20, 2010 p. 1/17 [CS7031] Graphics and Console Hardware and Real-time Rendering 6 th Lecture :: The Cache - Part Three

More information

Computer Organization

Computer Organization University of Pune S.E. I.T. Subject code: 214442 Computer Organization Part 25 : MESI Protocol UNIT IV Tushar B. Kute, Department of Information Technology, Sandip Institute of Technology & Research Centre,

More information

EECS 470 Final Exam Winter 2012

EECS 470 Final Exam Winter 2012 EECS 470 Final Exam Winter 2012 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Page 2 /11 Page 3 /13 Page

More information

High Performance Multiprocessor System

High Performance Multiprocessor System High Performance Multiprocessor System Requirements : - Large Number of Processors ( 4) - Large WriteBack Caches for Each Processor. Less Bus Traffic => Higher Performance - Large Shared Main Memories

More information

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and

More information

Overview: Shared Memory Hardware

Overview: Shared Memory Hardware Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing

More information

Lecture 24: Virtual Memory, Multiprocessors

Lecture 24: Virtual Memory, Multiprocessors Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large

More information

Verilog Coding Guideline

Verilog Coding Guideline Verilog Coding Guideline Digital Circuit Lab TA: Po-Chen Wu Outline Introduction to Verilog HDL Verilog Syntax Combinational and Sequential Logics Module Hierarchy Write Your Design Finite State Machine

More information

M A S S A C H U S E T T S I N S T I T U T E O F T E C H N O L O G Y DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

M A S S A C H U S E T T S I N S T I T U T E O F T E C H N O L O G Y DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE M A S S A C H U S E T T S I N S T I T U T E O F T E C H N O L O G Y DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE 6.111 Introductory Digital Systems Laboratory Fall 2017 Lecture PSet #6 of

More information

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based) Lecture 8: Snooping and Directory Protocols Topics: split-transaction implementation details, directory implementations (memory- and cache-based) 1 Split Transaction Bus So far, we have assumed that a

More information

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis Jan 31, 2012 John Wawrzynek Spring 2012 EECS150 - Lec05-verilog_synth Page 1 Outline Quick review of essentials of state elements Finite State

More information

Chapter 5. Thread-Level Parallelism

Chapter 5. Thread-Level Parallelism Chapter 5 Thread-Level Parallelism Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors saturated

More information

CME341 Assignment 4. module if\_else\_combinational\_logic( input [3:0] a, b, output reg [3:0] y ); * begin

CME341 Assignment 4. module if\_else\_combinational\_logic( input [3:0] a, b, output reg [3:0] y ); * begin CME341 Assignment 4 1. The verilog description below is an example of how code can get butchered by an engineer with lazy debugging habits. The lazy debugger wanted to try something and yet be able to

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Cache Coherence - Snoopy Cache Coherence rof. Michel A. Kinsy Consistency in SMs CU-1 CU-2 A 100 Cache-1 A 100 Cache-2 CU- bus A 100 Consistency in SMs CU-1 CU-2 A 200 Cache-1

More information

Lecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations

Lecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,

More information

Parallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence

Parallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence Parallel Computer Architecture Spring 2018 Distributed Shared Memory Architectures & Directory-Based Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Problem Set 3 Solutions ECE 551: Digital System Design and Synthesis Fall 2001 Final Version 1) For each of the following always behaviors: a) Does the given always behavior need a default statement as

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140

More information

Cache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem

Cache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem Cache Coherence Bryan Mills, PhD Slides provided by Rami Melhem Cache coherence Programmers have no control over caches and when they get updated. x = 2; /* initially */ y0 eventually ends up = 2 y1 eventually

More information

Thread- Level Parallelism. ECE 154B Dmitri Strukov

Thread- Level Parallelism. ECE 154B Dmitri Strukov Thread- Level Parallelism ECE 154B Dmitri Strukov Introduc?on Thread- Level parallelism Have mul?ple program counters and resources Uses MIMD model Targeted for?ghtly- coupled shared- memory mul?processors

More information

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann,

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

ECSE 425 Lecture 30: Directory Coherence

ECSE 425 Lecture 30: Directory Coherence ECSE 425 Lecture 30: Directory Coherence H&P Chapter 4 Last Time Snoopy Coherence Symmetric SMP Performance 2 Today Directory- based Coherence 3 A Scalable Approach: Directories One directory entry for

More information

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

EECS150 - Digital Design Lecture 10 Logic Synthesis

EECS150 - Digital Design Lecture 10 Logic Synthesis EECS150 - Digital Design Lecture 10 Logic Synthesis September 26, 2002 John Wawrzynek Fall 2002 EECS150 Lec10-synthesis Page 1 Logic Synthesis Verilog and VHDL stated out as simulation languages, but quickly

More information

EECS150 - Digital Design Lecture 6 - Logic Simulation. Encoder Example

EECS150 - Digital Design Lecture 6 - Logic Simulation. Encoder Example EECS150 - Digital Design Lecture 6 - Logic Simulation Feb 7, 2013 John Wawrzynek Spring 2013 EECS150 - Lec06-sim Page 1 Encoder Example What is y if x == 4 b1111? always @(x) : encode if (x == 4'b0001)

More information

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Lecture 29 Review CPU time: the best metric Be sure you understand CC, clock period Common (and good) performance metrics Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3

More information

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple

More information

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address

More information

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single

More information

Snooping coherence protocols (cont.)

Snooping coherence protocols (cont.) Snooping coherence protocols (cont.) A four-state update protocol [ 5.3.3] When there is a high degree of sharing, invalidation-based protocols perform poorly. Blocks are often invalidated, and then have

More information

Computer Architecture

Computer Architecture 18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes. Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Consistency models The scenario we will be studying: Some

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

ECE 485/585 Microprocessor System Design

ECE 485/585 Microprocessor System Design Microprocessor System Design Lecture 11: Reducing Hit Time Cache Coherence Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering and Computer Science Source: Lecture based

More information

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017 CS433 Homework 6 Assigned on 11/28/2017 Due in class on 12/12/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

Shared Symmetric Memory Systems

Shared Symmetric Memory Systems Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Portland State University ECE 588/688. Cache Coherence Protocols

Portland State University ECE 588/688. Cache Coherence Protocols Portland State University ECE 588/688 Cache Coherence Protocols Copyright by Alaa Alameldeen 2018 Conditions for Cache Coherence Program Order. A read by processor P to location A that follows a write

More information

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently

More information

Laboratory ELEC

Laboratory ELEC Laboratory ELEC 4708 2003 1.0 Design of an Integrated Multiplier This is a sample design done by Gord Allan. The design is a two s complement multiplier. Gord s files allow a choice of two cell libraries

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

Cache Coherence Tutorial

Cache Coherence Tutorial Cache Coherence Tutorial The cache coherence protocol described in the book is not really all that difficult and yet a lot of people seem to have troubles when it comes to using it or answering an assignment

More information

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

CSE 591: Advanced Hardware Design and Verification (2012 Spring) LAB #0

CSE 591: Advanced Hardware Design and Verification (2012 Spring) LAB #0 Lab 0: Tutorial on Xilinx Project Navigator & ALDEC s Active-HDL Simulator CSE 591: Advanced Hardware Design and Verification Assigned: 01/05/2011 Due: 01/19/2011 Table of Contents 1 Overview... 2 1.1

More information

Lecture 25: Multiprocessors

Lecture 25: Multiprocessors Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed

More information

ECE 4514 Digital Design II. Spring Lecture 9: Review of Key Ideas, System Commands and Testbenches

ECE 4514 Digital Design II. Spring Lecture 9: Review of Key Ideas, System Commands and Testbenches ECE 4514 Digital Design II Lecture 9: Review of Key Ideas, System Commands and Testbenches A Language Lecture Iterating the Key Ideas Verilog is a modeling language. It cannot express hardware directly.

More information

ECEN : Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University. Homework #1 Solutions

ECEN : Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University. Homework #1 Solutions ECEN 449 749: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University Homework #1 Solutions Upload your homework solution to ecampus as a single pdf file. Your

More information

Course Topics - Outline

Course Topics - Outline Course Topics - Outline Lecture 1 - Introduction Lecture 2 - Lexical conventions Lecture 3 - Data types Lecture 4 - Operators Lecture 5 - Behavioral modeling A Lecture 6 Behavioral modeling B Lecture 7

More information

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O 6.823, L21--1 Cache Coherence Protocols: Implementation Issues on SMP s Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Coherence Issue in I/O 6.823, L21--2 Processor Processor

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main

More information

14:332:231 DIGITAL LOGIC DESIGN. Verilog Functions and Tasks

14:332:231 DIGITAL LOGIC DESIGN. Verilog Functions and Tasks 4:332:23 DIGITAL LOGIC DESIGN Ivan Marsic, Rutgers University Electrical & Computer Engineering Fall 203 Lecture #24: Verilog Time Dimension and Test Benches Verilog Functions and Tasks Verilog function

More information

CSC526: Parallel Processing Fall 2016

CSC526: Parallel Processing Fall 2016 CSC526: Parallel Processing Fall 2016 WEEK 5: Caches in Multiprocessor Systems * Addressing * Cache Performance * Writing Policy * Cache Coherence (CC) Problem * Snoopy Bus Protocols PART 1: HARDWARE Dr.

More information

ECEC 355: Cache Design

ECEC 355: Cache Design ECEC 355: Cache Design November 28, 2007 Terminology Let us first define some general terms applicable to caches. Cache block or line. The minimum unit of information (in bytes) that can be either present

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,

More information

Computer Architecture

Computer Architecture Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core

More information

Pipelined MIPS processor with cache controller using VHDL implementation for educational purpose

Pipelined MIPS processor with cache controller using VHDL implementation for educational purpose Journal From the SelectedWorks of Kirat Pal Singh Winter December 28, 203 Pipelined MIPS processor with cache controller using VHDL implementation for educational purpose Hadeel Sh. Mahmood, College of

More information