Lab Report 6. Chris Dobson EEL4713

Similar documents
Trying to design a simple yet efficient L1 cache. Jean-François Nguyen

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

CSCE 212: FINAL EXAM Spring 2009

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Cycle Time for Non-pipelined & Pipelined processors

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality

Chapter 11 I/O Management and Disk Scheduling

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Memory Hierarchy: Motivation

Caches Concepts Review

ECE 3055: Final Exam

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

Storage and File Structure

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

EE 4683/5683: COMPUTER ARCHITECTURE

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

Page 1. Memory Hierarchies (Part 2)

Caching Basics. Memory Hierarchies

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Virtual Memory: From Address Translation to Demand Paging

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

Cache introduction. April 16, Howard Huang 1

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Course Administration

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

14:332:331. Week 13 Basics of Cache

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 351 Exam 3 Mon. 5/11/2015

Welcome to Part 3: Memory Systems and I/O

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

CS161 Design and Architecture of Computer Systems. Cache $$$$$

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Memory Hierarchy: The motivation

CS5460: Operating Systems Lecture 20: File System Reliability

Swapping. Operating Systems I. Swapping. Motivation. Paging Implementation. Demand Paging. Active processes use more physical memory than system has

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Memory Management Outline. Operating Systems. Motivation. Paging Implementation. Accessing Invalid Pages. Performance of Demand Paging

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

Chapter 5 The Memory System

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

CSE 153 Design of Operating Systems

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

Review: Computer Organization

V. Primary & Secondary Memory!

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

Question?! Processor comparison!

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.

High Performance Computing Course Notes High Performance Storage

CS152 Computer Architecture and Engineering Lecture 17: Cache System

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Virtual Memory Review. Page faults. Paging system summary (so far)

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

The Nios II Family of Configurable Soft-core Processors

- SLED: single large expensive disk - RAID: redundant array of (independent, inexpensive) disks

PowerVault MD3 SSD Cache Overview

I/O CANNOT BE IGNORED

Memory Hierarchy Y. K. Malaiya

Chapter Seven Morgan Kaufmann Publishers

The Btrfs Filesystem. Chris Mason

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Why memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho

ECE Sample Final Examination

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Caches. Hiding Memory Access Times

CS 61C: Great Ideas in Computer Architecture. Cache Performance, Set Associative Caches

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Memory Hierarchies &

CS 261 Fall Mike Lam, Professor. Memory

Virtual Memory: From Address Translation to Demand Paging

CSE 2021: Computer Organization

Cache Controller with Enhanced Features using Verilog HDL

CSE 4/521 Introduction to Operating Systems. Lecture 27 (Final Exam Review) Summer 2018

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Mass-Storage Structure

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CSE 2021: Computer Organization

Name: 1. Caches a) The average memory access time (AMAT) can be modeled using the following formula: AMAT = Hit time + Miss rate * Miss penalty

Introduction to cache memories

Memory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt

Pollard s Attempt to Explain Cache Memory

Adaptec MaxIQ SSD Cache Performance Solution for Web Server Environments Analysis

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Transcription:

Lab Report 6 Chris Dobson EEL4713

Section 1: Book Problems The Blue non-revised 4th edition was used 6.3.1) b) 13.1992 ms 6.3.2) *minimum assumes the case where there is no seek time or rotational delay. b) 6.15.1) b) 6.15.2) b) 6.15.3) Raid 4 is more efficient with small reads, because a single block can be accessed from just 1 disk instead of all the disks like raid 3 requires. Raid 4 also requires less reads to build a new parity block, making writing more efficient. Raid 3 has no real advantage over raid 4. 6.15.4)

Raid 5 constructs its parity blocks the same as raid 4, but it distributes them across the disks instead of storing them all on one disk. This prevents the parity disk from bottlenecking the performance during back to back writes. A single write will not see any improvement. 6.18.1) b) 6.18.2)

Execution TIme (cycles) miss rate Section 2: Cacti and SESC simulations Block Size Tradeoffs The first simulation involved testing different block sizes for the L1 instruction cache. The two charts below summarize the results from the experiment. The first chart shows the different miss rates depending on the block size, while the second chart shows the change in execution time. The miss rate makes a significant drop right away, with the decline becoming more gradual as time passes. This indicates that the most gain per area will be in the smaller block sizes, with the large ones offering similar miss rates. The execution time follows a different trend. As the block size increases, the execution time starts to go down but turns back up before long. This is caused by the larger latencies that cacti provided for the larger block sizes. L1 Instruction Cache Miss Rate vs Block Size.4.35.3.25.2.15.1.5 5 1 15 2 25 3 35 4 45 block size L1 Instruction Cache Execution Time vs Block Size 25 2 15 1 5 5 1 15 2 25 3 35 4 45 block size

Execution TIme (cycles) miss rate The second simulation involved testing different block sizes of the L1 data cache. The two charts below summarize the results from the experiment. The first chart shows the different miss rates depending on the block size, while the second chart shows the change in execution time. The results are significantly different from before. Increasing the block size actually increase the hit rate. This means the data cache does not have the same type of locality as the instruction cache. The fact that a larger block size decreases the total number of blocks available in the table (a fixed total cache size was used) could also be having a significant effect. The second chart provides some unexpected results. The execution time dips down in the beginning, just like in the previous example even though the miss rate is increasing. This is likely a byproduct of increasing the L2 cache at the same rate as the data cache. This brings the execution time gains from the previous simulation into question. Having the L2 cache independently set to a larger block size was causing SESC to lock up, meaning the configuration file was likely in error. Meaningful information can still be determined from the cache miss charts. L1 Data Cache Miss Rate vs Block Size.16.14.12.1.8.6.4.2 5 1 15 2 25 3 35 4 45 block size L1 Data Cache Execution Time vs Block Size 2 18 16 14 12 1 8 6 4 2 5 1 15 2 25 3 35 4 45 block size

Execution Time (cycles) miss rate Cache Associativity Tradeoffs This simulation involves simulating different cache associativities for the L1 data cache. The first chart shows the different miss rates depending on the associativitity, while the second chart shows the change in execution time. Like in the first case, there is a significant initial drop in miss rate which then levels off as the associativitity increases further. The case with an associativitity of 32 has almost the same hit rate as the fully associativite case. The same trend can be seen in the execution time as was present in the previous simulations. As the associativitity gets larger and larger, the cache latency increases. Considering that the hit rate decrease is negligible for the larger cases, it's only natural that the execution time should rise sharply..25.2.15.1.5 L1 Data Cache Miss Rate vs Associativity Size 1 1 1 1 1 associativitity L1 Data Cache Execution Time vs Associativity Size 125 124 123 122 121 12 119 118 117 116 1 1 1 1 1 associativitity

Execution Time (cycles) miss rate Cache Size Tradeoffs This simulation involves simulating different cache sizes for the L1 instruction cache. The first chart shows the different miss rates depending on the size, while the second chart shows the change in execution time. The first chart follows the same trend as two of the previous simulations. The hit rate starts out by dropping significantly, and then leveling off. The third test case is approximately as efficient as the largest. The execution time never actually drops however. The change in miss rate never outweighs the hefty increase in cache latency that is associated with increasing the cache size. L1 Instruction Cache Miss Rate vs Cache Size.4.35.3.25.2.15.1.5 -.5 2 4 6 8 1 12 Cache Size L1 Instruction Cache Execution Time vs Cache Size 14 135 13 125 12 115 2 4 6 8 1 12 Cache Size

Execution Time (cycles) Overall Cache Performance Experiment This simulation involves simulating three different test cases based of the previous results. Each test case represents an increase in cache latency, while the attributes are maximized to say within that latency. The data cache was first given an increase in associativitity, because it caused the most significant drop in the trials. The instruction cache was first made larger and given a larger block size, because those two attributes made the biggest changes in the previous simulations. The table below summarizes the 3 configurations used, while the chart below shows the relative execution times. The first case was the fastest by far. The smaller cache latencies must have been more important than the increased hit rates that went along with them. The fact that the miss rates were already so low (almost always below 5%) and the miss penalty is not too severe gave the configuration with the smaller latency a clear advantage. Test Case Data Cache Size Data Cache Associativity Data Cache Block Size Data Cache Latency Instruction Cache Size Instruction Associativity Instruction Block Size Instruction Latency 1 32768 8 32 4 65536 2 256 4 2 13172 16 32 5 65536 2 512 5 3 65536 32 32 6 262144 2 512 6 Execution Time 115 114 113 112 111 11 19 18 17 16 15 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 case number

Section 3: Implementing a Simple Data Cache This section deals with the addition of a basic two way associativite cache for the data memory of the pipelined MIPS processor designed in the previous lab. A write through policy and no-write allocate policy were used, so the cache only effects reads. The diagram below represents the cache system without the control logic. Whenever a write occurs, the data is written directly to ram (and updated in the cache if that address is present) so very little has changed during write cycles. Read cycles are controlled by a simple FSM. Whenever a cache miss, The FSM begins a 4 cycle process that reads each of the 4 words bytes associated with that address into one of the two possible locations that data can be present in the cache. The location is randomly chosen. During this process, the rest processor is stalled. When the data is done being read into the cache, the processor resumes. The total time needed to read the data into the cache is 4 cycles, which is 3 more than the single cycle needed to read from the cache. The point of this cache is to reduce the effect of high memory latency. This could be very use full in a system where the processor runs much faster than its main working memory, or when the memory has a relatively long latency. The current implementation on the FPGA would likely see no benefit and only a performance decrease because the memory is capable of running at the same rate as the CPU with no latency. It is likely that other technologies exit where even a simple cache system like this would result in a significant increase in performance (though a write buffer would be necessary to speed up the write process). Simulations Several different simulations are below that show various cache utilizations.

Case 1: Cache write through. Case 2: Cache Miss

Case3: Cache Hit Case 4: Cache Write