COE758 Digital Systems Engineering

Similar documents
Lecture 12: Memory hierarchy & caches

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Cycle Time for Non-pipelined & Pipelined processors

Memory. Lecture 22 CS301

Chapter 5. Memory Technology

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Computer Systems Laboratory Sungkyunkwan University

EN1640: Design of Computing Systems Topic 06: Memory System

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Recap: Machine Organization

Trying to design a simple yet efficient L1 cache. Jean-François Nguyen

EN1640: Design of Computing Systems Topic 06: Memory System

Memory. Objectives. Introduction. 6.2 Types of Memory

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

14:332:331. Week 13 Basics of Cache

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Memory Hierarchy and Caches

14:332:331. Week 13 Basics of Cache

Chapter 7-1. Large and Fast: Exploiting Memory Hierarchy (part I: cache) 臺大電機系吳安宇教授. V1 11/24/2004 V2 12/01/2004 V3 12/08/2004 (minor)

CSE 2021: Computer Organization

MEMORY. Objectives. L10 Memory

CSE 2021: Computer Organization

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II


LECTURE 11. Memory Hierarchy

EEC 483 Computer Organization

ECE232: Hardware Organization and Design

CpE 442. Memory System

Introduction to cache memories

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Basic Memory Hierarchy Principles. Appendix C (Not all will be covered by the lecture; studying the textbook is recommended!)

ECE 30 Introduction to Computer Engineering

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Advanced Memory Organizations

ELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Circle the memory type based on electrically re-chargeable elements

Memory. Memory Technologies

Logical Diagram of a Set-associative Cache Accessing a Cache

The Memory Hierarchy Cache, Main Memory, and Virtual Memory

Problem Set 10 Solutions

Introduction. Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1>

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Memory hierarchy Outline

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.

CS152 Computer Architecture and Engineering Lecture 16: Memory System

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Computer Architecture CS372 Exam 3

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

V. Primary & Secondary Memory!

The Memory Hierarchy & Cache

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Internal Memory Cache Stallings: Ch 4, Ch 5 Key Characteristics Locality Cache Main Memory

Digital Design LU. Lab Exercise 1

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

LECTURE 10: Improving Memory Access: Direct and Spatial caches

A Cache Hierarchy in a Computer System

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

CPE300: Digital System Architecture and Design

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

ECEC 355: Cache Design

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality

Caching Basics. Memory Hierarchies

COMPUTER ORGANIZATION AND DESIGN

LSN 7 Cache Memory. ECT466 Computer Architecture. Department of Engineering Technology

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Cache Memory and Performance

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

12 Cache-Organization 1

Memory Technologies. Technology Trends

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

COSC 6385 Computer Architecture - Memory Hierarchies (III)

CS 3510 Comp&Net Arch

CS152 Computer Architecture and Engineering. Lecture 15 Virtual Memory Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

Locality and Data Accesses video is wrong one notes when video is correct

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

Transcription:

COE758 Digital Systems Engineering Project #1 Memory Hierarchy: Cache Controller Objectives To learn the functionality of a cache controller and its interaction with blockmemory (SRAM based) and SDRAM-controllers. To get practical experience in the design and implementation of custom logic controllers and interfacing to SRAM memory units and other logic devices. To learn VHDL-coding technique within the Xilinx ISE CAD environment of the system and a hardware evaluation platform based on the Xilinx Spartan-3E FPGA. Background Most computers are designed to operate with different types of memory organized in a memory hierarchy. In such hierarchies, as the distance from the processor increases, so does the volume of the memory; likewise, as the distance from the processor increases, so too does the access time for each memory. This structure, along with appropriate operating mechanisms, allows processors to efficiently use slower (cheaper) memory effectively; by taking advantage of temporal and spatial locality, this hierarchy can mitigate most of the delay associated with slow, high-volume memories. Closest to the CPU is the Cache Memory. Cache memory is fast but quite small; it is used to store small amounts of data that have been accessed recently and are likely to be accessed again soon in the future. Data is stored here in blocks, each containing a number of words. To keep track of which blocks are currently stored in Cache, and how they relate to the rest of the memory, the Cache Controller stores identifiers for the blocks currently stored in Cache. These include the index, tag, valid and dirty bits, associated with a whole block of data. To access an individual word inside a block of data, a block offset is used as an address into the block itself. Using these identifiers, the Cache Controller can respond to read and write requests issued by the CPU, by reading and writing data to specific blocks, or by fetching or writing out whole blocks to the larger, slower Main Memory. Figure 1 shows a block diagram for a simple memory hierarchy consisting of CPU, Cache (including the Cache controller and the small, fast memory used for data storage), Main Memory Controller and Main Memory proper. Figure 1: Memory Hierarchy Block Diagram COE758 Digital System Engineering Project #1 Memory Hierarchy: Cache 1

Application: Simple Cache Controller 1. General Description The aim of this project is to design and implement a simplified Cache Controller for a hypothetical computer system. The Cache for this computer system will store a total of 256 bytes, organized into blocks of 32 words each; each word will have 1 byte. Figure 2 shows the organization of the Cache. Figure 2: Organization of the Cache (implemented using BlockRAM) As part of its operation, the CPU will issue read and write requests to the memory system. As shown in Figure 1, the CPU only interacts directly with the Cache. The Cache Controller must determine what action needs to be performed based on the current contents of the Cache Memory, as well as the identifiers associated with each cache block. The complete structure for the target system is shown in Figure 3 below. The system consists of the CPU, an SDRAM controller, and the cache acting as the intermediary between the two. The cache itself consists of a controller and a local SRAM block; the SRAM block is connected as shown. The project aim is to design the controller, and using it, assemble a complete Cache using the BlockRAM memory module introduced in Tutorial 3. COE758 Digital System Engineering Project #1 Memory Hierarchy: Cache 2

Figure 3: Complete Cache System 2. Specification CPU Address Word Organization The CPU issues 16 bit address words. The address word is received by the Cache Controller and stored in the Address Word Register (AWR). The AWR consists of three fields according to the cache organization (see Figure 2): the cache memory can store 8 blocks, each of which contains 32 1-byte words. Therefore, the Index field consists of 3 bits and the Offset field consists of 5 bits. Finally, the upper 8 bits of the address word represent the Tag. Figure 3 illustrates the AWR organization. Figure 4: Address Word Register Fields Cache Controller Behavior The aim of the project is to design a logic circuit that responds appropriately to read and write requests made by the CPU. The potential actions available to the cache controller are reading or writing to local (cache) memory, fetching whole data blocks from main COE758 Digital System Engineering Project #1 Memory Hierarchy: Cache 3

memory, or writing out blocks to main memory. The controller decides which of these actions to take based on the indicators it stores internally, which describe the status of each block in the cache memory. Specifically, the controller must compared the tag portion of the address with the tags stored for each of the eight blocks, as well as check the status of the dirty and valid bits for the block being targeted. Behavioral Cases: 1. Write a word to cache [hit] The cache receives a write request from the CPU. In this situation, the request made by the CPU is found in the cache, thus being a cache hit. The index and offset portion of the address supplied by the CPU is to be sent to the local SRAM as the write address for the new data. The dirty and valid bits associated with the targeted block must be set to 1, and the data must be written to the local SRAM memory. 2. Read a word from cache [hit] The cache receives a read request from the CPU. In this situation, the request made by the CPU is found in the cache, thus being a cache hit. The index and offset portion of the address supplied by the CPU is sent to the local SRAM and the read data is routed back to the CPU. 3. Read/Write from/to cache [miss] and dirty bit = 0 A read or write request is received, and the associated block is not found in cache. This requires performing a block replacement in Cache. First the dirty bit for the corresponding CPU address must be analyzed. If the dirty bit is 0 then the entire CPU address with the offset portion set to 00000 is passed to the SDRAM memory controller as the base address of the block to be read from memory, and the full block (all 32 bytes) must be read from the SDRAM memory controller, and written to the local Cache SRAM. The tag value from the CPU address needs to replace the value in the corresponding tag register, and the valid bit must be set to 1. Finally, the requested read or write operation must be performed following the procedure outlined for the hit cases above. 4. Read/Write from/to cache [miss] and dirty bit = 1 A read or write request is received, and the associated block is not found in cache (a miss) which leads to a block replacement operation. Again the dirty bit for the corresponding CPU address must be analyzed. If the dirty bit is 1 then the recently used (and soon to be replaced) block must be written back into main memory. The data block in local SRAM must be written to main memory via the SDRAM memory controller. The block must be written to the following base address: [Tag & Index & 00000]. Next, the new block (based on the address requested by the CPU) must be copied into the cache. This requires reading data from the SDRAM memory controller by issuing the following address: [Tag & Index & 00000]. The Tag is now the COE758 Digital System Engineering Project #1 Memory Hierarchy: Cache 4

Tag field of the AWR, requested by the CPU recently. The full block being read from the SDRAM memory controller must be written to the local cache SRAM. The original tag associated with this block must also be replaced with the new tag, as issued by the CPU. Once this process is complete, the transaction originally requested by the CPU can be completed, following the procedure outlined for the hit cases above. Interface Specifications Below are described the details of the three interfaces the cache controller must interact with: the CPU, the SDRAM controller, and the local SRAM (BlockRAM). In each case, the type of information provided through the interface, as well as the interface timing is described. CPU The CPU will periodically issue read or write transaction requests. The CPU interface consists of a strobe CS, a read/write indicator WR/RD, a 16-bit address ADD, 8-bit data in and out ports DIN and DOUT, and finally a ready indicator input RDY. Finally, the CPU is assumed to be synchronous with the cache controller, and shares the same clock signal, CLK. All interface ports are shown in Figure 5. Figure 5: CPU Interface When the CPU issues a transaction, it first sets the appropriate address on the address bus and sets the read/write indicator to the correct value; if a read is being issued, the WR/RD signal is low (0), whereas if a write is issued, the signal is high (1). Finally, if a write is performed, the appropriate data is also set on the DOUT port. Once all transaction signals are stable, the strobe CS is asserted, and stays asserted for 4 clock cycles. Figure 6 below shows an example of a write request. COE758 Digital System Engineering Project #1 Memory Hierarchy: Cache 5

Figure 6: CPU Transaction Example The ready indicator RDY is used by the Cache Controller to indicate to the outside world when a transaction is complete. When idle, the cache controller should leave the RDY signal asserted (thus indicating it is ready to accept transactions). Once a transaction is received, the signals should be de-asserted, and should be kept low until the requested operation is complete. SDRAM Controller The SDRAM Controller responds to block read and write requests issued by the Cache Controller. Its interface is shown in Figure 7. The interface consists of a 16-bit address ADD, a read/write indicator WR/RD, a strobe MEMSTRB and data in and out ports DIN and DOUT. Finally, the SDRAM Controller is synchronized with the Cache Controller, by sharing the same clock CLK. Figure 7: SDRAM Controller Interface COE758 Digital System Engineering Project #1 Memory Hierarchy: Cache 6

The SDRAM Controller responds to block read and write requests from the Cache Controller. To write a word to main memory, the correct base address is set on the ADD port by the Cache Controller, and the WR/RD signal is asserted (set to 1) to indicate a write operation. Finally, once all other signals are stable (one clock cycle later), the strobe MEMSTRB is asserted for one clock cycle. To write a full block to memory, this process is repeated 32 times, with the address and data changing accordingly. To read a full block from memory, the same process is used, with the exception that WR/RD is held low (0), and no data needs to be placed on the DIN port; rather, the data will appear on the DOUT port. Figures 8 and 9 below show read and write processes for a full block of data. Figure 8: SDRAM Block Read Figure 9: SDRAM Block Write COE758 Digital System Engineering Project #1 Memory Hierarchy: Cache 7

Local SRAM The BlockRAM memory used to implement the local memory for the cache has the interface shown in Figure 10 (based on Tutorial 3). It consists of an 8-bit address ADD, 8-bit data input and output DIN and DOUT, and a write enable signal WEN. Finally, the BlockRAM shares the same clock CLK as the Cache Controller. Figure 10: BlockRAM Interface All operations issued to the cache controller are synchronized on the rising edge of the clock. Read operations are performed by setting the appropriate address on the address bus. The addressed data will propagate to the output DOUT after the next rising edge of the clock. Write operations are performed by setting a specific address and data on the appropriate ports, and then asserting the write enable signal WEN. On the next rising edge, the data will be written to the specified address. Read and write operations are shown in Figures 11 and 12 below. Figure 11: BlockRAM Read Operation COE758 Digital System Engineering Project #1 Memory Hierarchy: Cache 8

Figure 12: BlockRAM Write Operation 4. Target The target platform / device is Xilinx Spartan-3E FPGA 5. Results The following parameters have to be determined based on hardware emulation: N Cache performance parameter Time in ns 1 Hit / Miss determination time 2 Data access time 3 Block replacement time 4 Hit time (Case 1 and 2) 5 Miss penalty for Case 3 (when D-bit = 0) 6 Miss penalty for Case 4 (when D-bit = 1) Timeline - Milestones Week Outline What is Due? 4 Project #1 description and specs. 5 VHDL code and compilation. The symbol and block diagram. 6 Testing and VHDL corrections. VHDL code and correct compilation. 7 Hardware emulation. 8 Project #1 demonstration. Project #1 Final report submission COE758 Digital System Engineering Project #1 Memory Hierarchy: Cache 9