Way-Predicting Cache and Pseudo- Associative Cache for High Performance and Low Energy Consumption

Size: px
Start display at page:

Download "Way-Predicting Cache and Pseudo- Associative Cache for High Performance and Low Energy Consumption"

Transcription

1 Way-Predicting Cache and Pseudo- Associative Cache for High Performance and Low Energy Consumption Xiaomin Ding, Minglei Wang (Team 28) Electrical and Computer Engineering, University of Florida Gainesville, USA {dingxiaomin, Abstract Cache is a crucial part of computer architecture, influencing the performance of whole system. In this thesis, we adapted two previously proposed optimization methods to cache way-predicting cache and pseudoassociative cache; chose AMAT (miss rate, hit time) and energy consumption as main analytical factors; then constructed the simulators by modifying the source code of SimpleScalar and carried out simulation and evaluation experiments on SimpleScalar platform, using SPEC2000 benchmarks. The test results basically match the principles in theoretical computer architecture. In the end, we gave the outcomes and conclusions. Keywords- cache performance; way predicting; pseudo associative; energy consumption component I. INTRODUCTION From the earliest days of computing, programmers have wanted unlimited amounts of fast memory and then they took advantage of the principle of locality by organizing the memory of a computer into a hierarchy [1]. A memory hierarchy consists of multiple levels of memory with different speeds and sizes. Cache [2] is the level of the memory hierarchy between the processor and main memory. Nowadays, the disparity between processor and DRAM memory has been increasing in recent years- the improvement of microprocessor is 60% per year and that of DRAM memory is less than 10% [3]. Because the cache can bridge the gap between the CPU and DRAM memory, the performance of cache would affect the performance of the computer much more than before. There already exist lots of different cache optimization strategies [4], such as using way prediction to reduce hit time, using pipelined cache access to increase cache bandwidth, critical word first and early restart to reduce miss penalty [5]. In this paper, we choose two optimization methods (way-prediction, pseudo associative cache) to improve the cache performance and evaluate the outcomes from the simulations. In chapter 2, we make a brief introduction of the background and related work of this subject, including principles of cache and tools in this thesis. In chapter 3, specific approaches (principles and realization of two optimizations) in our project are shown. In chapter 4, we list the results of our project and make evaluation and comparison of them. In chapter 5, we draw our conclusions from implemented simulation. II. BACKGROUND AND RELATED RESEARCH In this section, we first briefly introduce the principle of cache, evaluation methods and some related research. Then we describe the simulation tools and benchmark used in our project. A. Principle of Cache In computer system, a cache aims to store data so that probable future requests for the data in it can be served faster [6]. The data stored within a cache might be values that have been requested or computed earlier by the processor or duplicates of original values, which are stored in its lower level memory (main memory or disks). Cache hits occur when the requested data is contained in the cache [7], such requests can be served by simply reading the cache, which is relatively faster. Otherwise, during a cache miss [8], the data has to be fetched from its lower level memory- this case is obviously relatively slower.

2 There are three categories of cache organization defining how a block is placed in a cache: direct mapped, fully associative and set associative. When processor request data from a specified address to the cache, the address is firstly divided into two parts: block address and the block offset. The block address can be further divided into the tag field and the index field. The block offset field selects the desired data from the block, the index field selects the set, and the tag field is compared against for a hit. Figure 1 shows the three portions of an address in a set-associative or direct mapped. [2] Block address Tag Index Block offset Figure 1. Three portions of a block address [2] B. Evaluation Methods There are several kinds of methods to evaluate the cache performance. In this paper, we will focus on average memory access time (AMAT) as shown in equation (1): AMAT HitTime MissRate MissPenalty (1) Hence, we can easily get that three factors influencing AMAT. With reduction of these factors, the performance of cache will be improved. C. SimpleScalar and SPEC2000 SimpleScalar [9] is an open source computer architecture simulator developed by Todd Austin at the University of Wisconsin Madison. It is a simulator, used to show that Machine A is better than Machine B without building either Machine A or Machine B. It is written using C programming language. [10] SimpleScalar [11] is a set of tools that model a virtual computer system with CPU, Cache and Memory Hierarchy. Using the SimpleScalar tools, users can build modeling applications that simulate real programs running on a range of modern processors and systems [12]. The tool set includes sample simulators ranging from a fast functional simulator to a detailed, dynamically scheduled processor model that supports non-blocking caches, speculative execution, and state-of-the-art branch prediction. [9] In our paper, in order to model cache, we use sim-cache simulator as our basic tool and realize our purpose through modifying the source code of sim-cache and related files. In this paper, we use SPEC2000 benchmark to test the performance of cache. [13] SPEC2000 [14] is the nextgeneration industry-standardized CPU-intensive benchmark suite. SPEC [15] designed CPU2000 to provide a comparative measure of computing intensive performance across the widest practical range of hardware. [16] The implementation resulted in source code benchmarks developed from real user applications. These benchmarks measure the performance of the processor, memory and compiler on the tested system. In this paper, we choose two benchmarks to test the cache with or without optimization to accomplish our work. D. Related Research There are dozens of researches related to cache performance optimization upon these two aspects. For pseudoassociative cache, Yongjoon Lee, Byung-Kwon Chung proposed the pseudo 3-way set-associative caches [17], which overcome the limitation of the hit rate of 2-way set-associative cache; Bobbala, L.D., Salvatierra, J. and Byeong Kil Lee proposed a composite cache mechanism [18] by emphasizing primary way utilization and pseudo-associativity for L2 cache to maximize cache performance, etc. For way-predicting cache, Inoue, K., Ishihara, T., Murakami, K. proposed way prediction [19] for achieving high performance and low energy consumption of set-associative caches; Hsin-Chuan Chen, Jen-Shiun Chiang proposed a new cache scheme that uses the valid bits pre-decision [20] for way predicting to improve cache performance; Cuiping Xu, Ge Zhang,Shouqing Hao proposed a new way-prediction scheme [21] for achieving low energy consumption and high performance for set-associative instruction cache.

3 III. APPROACH In this section, we focus on two optimization strategies, pseudo-associative cache and way-prediction cache. We first introduce the principles of these optimization methods and then describe approaches of realization with SimpleScalar in detail. A. Pseudo-Associative Cache 1) Principle of Pseudo-Associative Cache Pseudo-associative cache, also called column associative cache [5]10, is a cache in which space is logically divided into two zones. For each visit, the pseudo-associate cache will first act like the direct mapped cache in the first zone, which means that each block has only one place where it may appear in the cache. If there is a cache hit, this cache is just like the direct mapped cache. If the cache misses, the CPU will visit a specified location in another zone. If cache hits this time, a pseudo hit happens and then the block will be swapped for the block of the first entry; otherwise, the processor will access the next level memory to find desired data. In this case, a real cache miss occurs. In other words, pseudo-associative cache combines the lower hit time of direct mapped cache and lower miss rate of 2-way associative cache. 2) Realization of pseudo-associative cache In general, the next location to check is to invert the highest index bit [5]10 of the block address to get the next index address in the cache. Based on the principle of pseudo-associative cache, we use SimpleScalar as our simulation tool and modify the source code to realize a 2-way pseudo-associative cache. The Figure 2 shows how pseudo-associative cache works. Block address Tag Index Block offset 2 1 Cache blocks Figure 2. Pseudo-associative cache In step 1, a block address is given by the processor, the index field is used to select the set, and in step 2, the tag field is compared for a hit. If the cache hits, each set only contains one block, it really acts as a direct mapped cache and we call it fast hit. If the cache misses, the high order index bit flips as a new index field to select block in the cache in step 3 and compare the tag field. If cache hits, it is a pseudo hit and swap the blocks in step 5. Otherwise, the cache misses, we will find the block in the lower level memory. In SimpleScalar, we added a Marco definition in cache.c file to invert the original index of the desired block address. To realize step 5, we wrote a swap function swap_blk_data(). And we added an energy consumption parameter in cache.h, cache.c files. All new added function should be first defined in cache.h file. Then we mainly modified the cache_acess () function to carry out the pseudo-associative cache. We set the pseudo-associative as 2-

4 way set associative cache first, and then utilize the different way to access cache, that is what the cache_acess() function do. When the cache hits in the first access, it only activates one block; under circumstance of pseudo hit, it needs to activate two blocks. If cache misses, it needs to activate two blocks. So we make comparisons of performance among direct mapped cache, 2-way associative cache and pseudo-associative cache. B. Way Prediction Cache 1) Principle of way-prediction cache In way-prediction cache, we designed a mechanism to predict the way. As a result the multiplexor is set early to select the desired block, and only a single tag comparison is needed when accessing the cache. [4]6 Hit time is reduced consequently. If the prediction is correct, the cache access latency is the fast hit time, if not, it tries the other block, changes the way predictor, and has a latency of one extra clock cycle. 2) Realization of way-prediction cache We designed two kinds of way-prediction cache: static way-prediction and dynamic way-prediction. The first approach uses way-prediction in the whole process of cache access and the second approach decides whether to use way-prediction according to prediction hit rate. Each approach has pros and cons. The reason why using dynamic wayprediction cache is that for some programs, the locality is so bad that the performance of way-prediction cache may be poor, which may influence the cache performance. We modified the content of cache hit function and designed a way-prediction model. Our idea is that the predictor will select the last block as the next target each time and be updated after every cache access. For static way-prediction cache, we simply used this predictor in the cache access process; for dynamic we added several parameters to calculate the prediction hit rate. We assumed a basic access unit which presents for the times of cache: predict_time_slice, and we set the initial value (total number of cache accesses) of the unit to be 600; finally we gathered the prediction hit number in this time unit. If this hit rate is larger than a predetermined threshold value(set as 0.9 in our project), the next access will also use way-prediction; if not, it means in this time unit the locality is bad, so the cache will close way-prediction and all blocks will be activated and tags will be compared during following cache accesses. However, we still let the cache record the number of way-prediction hits, when the hit rate is larger than the threshold value, the cache starts way-prediction mechanism again. In this case, the cache only needs to activate one block. Consequently, the hit time and energy consumption are reduced. To be precise and numerical, we add energy consumption parameter to evaluate wayprediction cache. C. Combine way-prediction and pseudo-associative cache in Simplescalar The two methods are combined as the last procedure. For simplicity, we set different input parameters to make decisions of which optimization strategy to use in commands. For example, if we want to apply dynamic wayprediction method to level 1 data cache, the command line should be as follows: -cache: dl1 dl1:256:32:1:d where d represents dynamic way-prediction cache, w for static way-prediction cache and p for pseudoassociative cache. IV. RESULTS AND ANALYSIS A. Evaluation Setup To evaluate the proposed approaches and compare these cache optimization strategies, we use benchmarks to test the simulator after construction. We used GZIP and BZIP2 which are parts of SPEC2000 benchmarks and run them in VM. We tested the first 200 million instructions in these benchmarks and used level 1 data cache as our target cache.

5 B. Evaluation results and analysis In this section, the overall benefits of proposed approach, compared with the cache without optimization are summarized first, followed by the more detailed results and analysis of how the performance of cache is improved by pseudo-associative and way-prediction optimization strategies. Based on the principle of pseudo-associative cache, it combines the shorter hit time of direct mapped cache with lower miss rate of two way associative cache. Because of the short hit time, the AMAT will be improved and the number of blocks that pseudo-associative cache needs to activate is less than 2-way set associative cache. Also, its miss rate would be lower than direct mapped cache. For way-prediction cache, it will reduce hit time, so AMAT will be improved. And on the other hand, each time it hits, it only need to activate one block. Due to the locality, the energy consumption will be lower. 1) Pseudo associative cache Firstly, Figure 3 shows the miss rate for direct mapped cache and pseudo-associative cache, plotting cache size on the x-axis and miss rate on the y axis. Figure 3 Miss rate for direct mapped cache and pseudo-associative cache From the Figure 3, it is apparent that the pseudo-associative cache has lower miss rate than direct mapped cache. Theoretically, this is because the pseudo-associative cache is actually a 2-way associative cache- it contains more blocks than direct mapped. Secondly, Figure 4 shows the energy consumption for both cache associative strategies. The x axis represents cache size and the y axis represents energy consumption. Figure 4. Energy consumption for different cache associative strategies

6 In our experiments, we assume that activating one block needs energy of 1unit. From Figure 4, for instance, after running 200 million instructions of GZIP benchmark in 16KB size level 1 data cache, direct mapped cache needs energy , two-way set associative cache needs and pseudo-associative cache needs energy Here we can see that pseudo associative cache performs the lowest energy consumption. However, this simulation result contradicts the theory. Whenever the processor visits the pseudo-associative cache, only one block is activated for a set if cache hits (called cache fast hit) in the first zone; if cache misses, the corresponding block in the second zone is activated (if cache hits, called cache slow hit). For pseudo-associative cache, as there is a probability of activating 2 blocks, the total energy consumption would be higher than directed mapped. The higher rate the cache fast hits, the lower energy it will consume. When pseudo-associative cache swapped the blocks after cache misses in the first zone, the rate of cache fast hit will be relatively high. This is to say that it will absolutely reduce energy than directed mapped cache. Figure 5 shows the hit ratio between cache fast hit and cache slow hit. We can see that the cache fast hit ratio is far higher than cache slow hit ratio which means it will save energy remarkably. Figure 5. Hit ratio between cache fast hit and cache slow hit In this example, the pseudo associative cache saved 52% energy than 2-way set associative cache. 2) Way-predicting cache First, based on the principle of way-predicting cache, the next block be selected will set first, so this mechanism mainly aims at reducing the hit time- decreases AMAT consequently. Also, if the prediction hit rate is high, the energy consumption will dramatically reduce. Figure 6 shows that energy consumption of caches using wayprediction and without way-prediction. Figure 6. Energy consumption of cache using way-prediction and without way-prediction

7 We can see that the static way-prediction cache has the lowest energy. Without way-prediction cache, the energy consumption is highest. For static way-prediction cache, each time it only needs to activate one block, so it must have the lowest energy; for dynamic way-prediction cache, if the predict hit ratio is lower than the value we set, it will close the way-prediction, which also demonstrate that in this period, the locality is bad. Although after closed wayprediction, it will result in increasing energy consumption because of the useless of way-prediction for lots of time, it will overcome the overhead of way-prediction. Figure 7 shows that the miss rate of predictor for cache using static way-prediction and dynamic way-prediction. In dynamic predictor, prediction mechanism is ceased when the miss rate is lower than a threshold value (0.9 here). As a result the miss rate is lower. Closure of prediction may avoid large overhead and a waste of time in wayprediction, otherwise the bandwidth and time may waste on high proportion of incorrect predictions, if locality is bad in some programs. Figure 7. Miss rate of predictor for cache using static way-prediction and dynamic way-prediction Table 1 shows the energy saved compared with cache without way-prediction using different way-prediction model. From table 1 we can draw the conclusion that the higher the associative is, the more energy will be saved using way-prediction. The reason why is that highly associative cache will activate more blocks at a time. Associative Static way-prediction energy saved Dynamic way-prediction energy saved 1 0% % 29% % 40.5% % Table 1. Energy saved compared with cache without way-prediction using different way-prediction model 3) Other benchmarks We also used BZIP2 benchmark to test our constructed simulator and ran first 300 million instructions. The results are shown in Figure 8-11 and they conform to the theoretical results except small contractions. Figure 8 shows the miss rate of direct mapped and pseudo-associative caches for different cache size. Figure 9 shows the energy consumption of pseudo-associative, 2-way associative cache and direct mapped cache. Figure 10 shows the energy consumption of way-prediction cache and without way-prediction cache. Figure 11 shows the prediction hit ratio of static way-prediction cache and dynamic way-prediction.

8 Figure 8. Miss rate of direct mapped and pseudo-associative caches Figure 9. Energy consumption for different cache associative strategies Figure 10. Energy consumption of way-prediction cache and without way-prediction cache

9 Figure 11. Predictor s hit ratio of static way - prediction cache and dynamic way-prediction V. CONCLUSIONS In this paper, we focused on two current popular cache optimization methods - way-predicting cache and pseudoassociative cache to improve the cache performance on AMAT (hit time and miss rate) and energy consumption in great detail. The pseudo-associative combines the shorter hit time of direct mapped cache and relatively lower miss rate of two set associative cache. The static way-prediction cache reduces hit time and saves energy, but it may have a bad performance when encountering bad locality. Then we also evaluate dynamic way-prediction cache which can conquer the drawback of the static way-prediction, because it is able to start or cease the way-prediction mechanism according to the situation of locality. We use SPEC2000 benchmark to test the simulators and to check the accordance between simulation results and the theoretical conclusions. Basically the simulation results are complied with the theory results. REFERENCE [1] Rami J. Ammari, A Study For Reducing Conflict Misses in Data Cache [2] David A. Patterson, John L. Hennessy, "Computer Organization and Design : the hardware/software interface", 2nd ed, San Francisco, Calif.: Morgan Kaufmann Publishers [3] David A. Patterson, John L. Hennessy, "Computer Architecture:: a quantitative approach", 4th ed, San Francisco : Morgan Kaufmann Publishers, [4] C. Kozyrakis, "Advanced Caching Techniques." [5] Markus Kowarschik, Christian Weis, "An Overview of Cache Opitimization Techniques and Cache-Aware Numerical Algorithms" [6] [7] Santanu Kumar Dash, Thambipillai Srikanthan, "Rapid Estimation of Instruction Cache Hit Rates Using Loop Profiling" /2008, IEEE, pp [8] Vlastimil Babka, Lukas Marek and Petr Tuma, "When Misses Differ: Investigating Impact of Cache Misses on Observed Performance", 15th International Conference on Parallel and Distributed Systems [9] [10] [11] Naraig Manjikian, "Enhancements and Applications of the SimpleScalar Simulator for Undergraduate and Graduate Computer Architecture Education", WCAE '00 Proceedings of the 2000 workshop on Computer architecture education [12] [13] [14] Jason F. Cantin. Cache Performance for SPEC CPU2000G. Eason, B. Noble, and I. N. Sneddon, On certain integrals of Lipschitz-Hankel type involving products of Bessel functions, Phil. Trans. Roy. Soc. London, vol. A247, pp , April [15] MA Hai-feng, YAO Nian-min, FAN Hong-bo, "Cache Performance Simulation and Analysis under SimpleScalar Platform",. [16] Henning J. L., SPEC CPU2000: Measuring CPU Performance in the New Millennium, IEEE Computer, vol. 33, no. 7, pp July [17] Yongjoon Lee,Byung-Kwon Chung, "Pseudo 3-way set-associative cache: a way of reducing miss ratio with fast access time", Electrical and Computer Engineering, 1999 IEEE Canadian Conference

10 [18] Bobbala, L.D.,Salvatierra, J., Byeong Kil Lee, "Composite Pseudo-Associative Cache for Mobile Processors", Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2010 IEEE International Symposium [19] Inoue, K., Ishihara, T., Murakami, K.,"Way-predicting set-associative cache for high performance and low energy consumption", Low Power Electronics and Design International Symposium [20] Hsin-Chuan Chen, Jen-Shiun Chiang, "Low-power way-predicting cache using valid-bit pre-decision for parallel architectures", Advanced Information Networking and Applications, 19th International Conference [21] Cuiping Xu, Ge Zhang,Shouqing Hao, "Fast Way-Prediction Instruction Cache for Energy Efficiency and High Performance", Networking, Architecture, and Storage IEEE International Conference

AN OVERVIEW OF HARDWARE BASED CACHE OPTIMIZATION TECHNIQUES

AN OVERVIEW OF HARDWARE BASED CACHE OPTIMIZATION TECHNIQUES AN OVERVIEW OF HARDWARE BASED CACHE OPTIMIZATION TECHNIQUES Swadhesh Kumar 1, Dr. P K Singh 2 1,2 Department of Computer Science and Engineering, Madan Mohan Malaviya University of Technology, Gorakhpur,

More information

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses

More information

Preliminary Evaluation of the Load Data Re-Computation Method for Delinquent Loads

Preliminary Evaluation of the Load Data Re-Computation Method for Delinquent Loads Preliminary Evaluation of the Load Data Re-Computation Method for Delinquent Loads Hideki Miwa, Yasuhiro Dougo, Victor M. Goulart Ferreira, Koji Inoue, and Kazuaki Murakami Dept. of Informatics, Kyushu

More information

Keywords Cache mapping technique, Cache optimization, Cache miss, Cache Hit, Miss Penalty

Keywords Cache mapping technique, Cache optimization, Cache miss, Cache Hit, Miss Penalty Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Survey on

More information

Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison

Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison Jian Chen, Ruihua Peng, Yuzhuo Fu School of Micro-electronics, Shanghai Jiao Tong University, Shanghai 200030, China {chenjian,

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

SF-LRU Cache Replacement Algorithm

SF-LRU Cache Replacement Algorithm SF-LRU Cache Replacement Algorithm Jaafar Alghazo, Adil Akaaboune, Nazeih Botros Southern Illinois University at Carbondale Department of Electrical and Computer Engineering Carbondale, IL 6291 alghazo@siu.edu,

More information

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 29: an Introduction to Virtual Memory Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Virtual memory used to protect applications

More information

Lec 11 How to improve cache performance

Lec 11 How to improve cache performance Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common

More information

Topics. Digital Systems Architecture EECE EECE Need More Cache?

Topics. Digital Systems Architecture EECE EECE Need More Cache? Digital Systems Architecture EECE 33-0 EECE 9-0 Need More Cache? Dr. William H. Robinson March, 00 http://eecs.vanderbilt.edu/courses/eece33/ Topics Cache: a safe place for hiding or storing things. Webster

More information

More on Conjunctive Selection Condition and Branch Prediction

More on Conjunctive Selection Condition and Branch Prediction More on Conjunctive Selection Condition and Branch Prediction CS764 Class Project - Fall Jichuan Chang and Nikhil Gupta {chang,nikhil}@cs.wisc.edu Abstract Traditionally, database applications have focused

More information

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann,

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

Architecture Tuning Study: the SimpleScalar Experience

Architecture Tuning Study: the SimpleScalar Experience Architecture Tuning Study: the SimpleScalar Experience Jianfeng Yang Yiqun Cao December 5, 2005 Abstract SimpleScalar is software toolset designed for modeling and simulation of processor performance.

More information

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Elec525 Spring 2005 Raj Bandyopadhyay, Mandy Liu, Nico Peña Hypothesis Some modern processors use a prefetching unit at the front-end

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

Demand fetching is commonly employed to bring the data

Demand fetching is commonly employed to bring the data Proceedings of 2nd Annual Conference on Theoretical and Applied Computer Science, November 2010, Stillwater, OK 14 Markov Prediction Scheme for Cache Prefetching Pranav Pathak, Mehedi Sarwar, Sohum Sohoni

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

A LITERATURE SURVEY ON CPU CACHE RECONFIGURATION

A LITERATURE SURVEY ON CPU CACHE RECONFIGURATION A LITERATURE SURVEY ON CPU CACHE RECONFIGURATION S. Subha SITE, Vellore Institute of Technology, Vellore, India E-Mail: ssubha@rocketmail.com ABSTRACT CPU caches are designed with fixed number of sets,

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science CPUtime = IC CPI Execution + Memory accesses Instruction

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Using a Victim Buffer in an Application-Specific Memory Hierarchy

Using a Victim Buffer in an Application-Specific Memory Hierarchy Using a Victim Buffer in an Application-Specific Memory Hierarchy Chuanjun Zhang Depment of lectrical ngineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Depment of Computer Science

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140

More information

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review

CISC 662 Graduate Computer Architecture Lecture 16 - Cache and virtual memory review CISC 662 Graduate Computer Architecture Lecture 6 - Cache and virtual memory review Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David

More information

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW

More information

Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research

Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Joel Hestness jthestness@uwalumni.com Lenni Kuff lskuff@uwalumni.com Computer Science Department University of

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

Chapter-5 Memory Hierarchy Design

Chapter-5 Memory Hierarchy Design Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Predictive Line Buffer: A fast, Energy Efficient Cache Architecture

Predictive Line Buffer: A fast, Energy Efficient Cache Architecture Predictive Line Buffer: A fast, Energy Efficient Cache Architecture Kashif Ali MoKhtar Aboelaze SupraKash Datta Department of Computer Science and Engineering York University Toronto ON CANADA Abstract

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Cache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

Memory Hierarchy. Advanced Optimizations. Slides contents from:

Memory Hierarchy. Advanced Optimizations. Slides contents from: Memory Hierarchy Advanced Optimizations Slides contents from: Hennessy & Patterson, 5ed. Appendix B and Chapter 2. David Wentzlaff, ELE 475 Computer Architecture. MJT, High Performance Computing, NPTEL.

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

For Problems 1 through 8, You can learn about the "go" SPEC95 benchmark by looking at the web page

For Problems 1 through 8, You can learn about the go SPEC95 benchmark by looking at the web page Problem 1: Cache simulation and associativity. For Problems 1 through 8, You can learn about the "go" SPEC95 benchmark by looking at the web page http://www.spec.org/osg/cpu95/news/099go.html. This problem

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

To Use or Not to Use: CPUs Cache Optimization Techniques on GPGPUs

To Use or Not to Use: CPUs Cache Optimization Techniques on GPGPUs To Use or Not to Use: CPUs Optimization Techniques on GPGPUs D.R.V.L.B. Thambawita Department of Computer Science and Technology Uva Wellassa University Badulla, Sri Lanka Email: vlbthambawita@gmail.com

More information

V. Primary & Secondary Memory!

V. Primary & Secondary Memory! V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)

More information

Skewed-Associative Caches: CS752 Final Project

Skewed-Associative Caches: CS752 Final Project Skewed-Associative Caches: CS752 Final Project Professor Sohi Corey Halpin Scot Kronenfeld Johannes Zeppenfeld 13 December 2002 Abstract As the gap between microprocessor performance and memory performance

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

A Framework for the Performance Evaluation of Operating System Emulators. Joshua H. Shaffer. A Proposal Submitted to the Honors Council

A Framework for the Performance Evaluation of Operating System Emulators. Joshua H. Shaffer. A Proposal Submitted to the Honors Council A Framework for the Performance Evaluation of Operating System Emulators by Joshua H. Shaffer A Proposal Submitted to the Honors Council For Honors in Computer Science 15 October 2003 Approved By: Luiz

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016 Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss

More information

Advanced Computer Architecture (CS620)

Advanced Computer Architecture (CS620) Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).

More information

for Energy Savings in Set-associative Instruction Caches Alexander V. Veidenbaum

for Energy Savings in Set-associative Instruction Caches Alexander V. Veidenbaum Simultaneous Way-footprint Prediction and Branch Prediction for Energy Savings in Set-associative Instruction Caches Weiyu Tang Rajesh Gupta Alexandru Nicolau Alexander V. Veidenbaum Department of Information

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1> Chapter 8 Digital Design and Computer Architecture: ARM Edition Sarah L. Harris and David Money Harris Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 Chapter 8 :: Topics Introduction

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor

More information

Question?! Processor comparison!

Question?! Processor comparison! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 5.1-5.2!! (Over the next 2 lectures)! Lecture 18" Introduction to Memory Hierarchies! 3! Processor components! Multicore processors and programming! Question?!

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST Chapter 5 Memory Hierarchy Design In-Cheol Park Dept. of EE, KAIST Why cache? Microprocessor performance increment: 55% per year Memory performance increment: 7% per year Principles of locality Spatial

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Chapter 02. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 02. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 02 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 2.1 The levels in a typical memory hierarchy in a server computer shown on top (a) and in

More information

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 26 Cache Optimization Techniques (Contd.) (Refer

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors Dan Nicolaescu Alex Veidenbaum Alex Nicolau Dept. of Information and Computer Science University of California at Irvine

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o

Memory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006

More information

Memory Hierarchy Basics

Memory Hierarchy Basics Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information

Techniques for Efficient Processing in Runahead Execution Engines

Techniques for Efficient Processing in Runahead Execution Engines Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt Depment of Electrical and Computer Engineering University of Texas at Austin {onur,hyesoon,patt}@ece.utexas.edu

More information

Hitting the Memory Wall: Implications of the Obvious. Wm. A. Wulf and Sally A. McKee {wulf

Hitting the Memory Wall: Implications of the Obvious. Wm. A. Wulf and Sally A. McKee {wulf Hitting the Memory Wall: Implications of the Obvious Wm. A. Wulf and Sally A. McKee {wulf mckee}@virginia.edu Computer Science Report No. CS-9- December, 99 Appeared in Computer Architecture News, 3():0-,

More information

The University of Adelaide, School of Computer Science 13 September 2018

The University of Adelaide, School of Computer Science 13 September 2018 Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: John Wawrzynek & Vladimir Stojanovic http://insteecsberkeleyedu/~cs61c/ Typical Memory Hierarchy Datapath On-Chip

More information

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

COSC 6385 Computer Architecture - Memory Hierarchy Design (III) COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses

More information

Threshold-Based Markov Prefetchers

Threshold-Based Markov Prefetchers Threshold-Based Markov Prefetchers Carlos Marchani Tamer Mohamed Lerzan Celikkanat George AbiNader Rice University, Department of Electrical and Computer Engineering ELEC 525, Spring 26 Abstract In this

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017 Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the

More information

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies

Let!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies 1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of

More information

Tutorial 11. Final Exam Review

Tutorial 11. Final Exam Review Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache

More information

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File

Agenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File EE 260: Introduction to Digital Design Technology Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa 2 Technology Naive Register File Write Read clk Decoder Read Write 3 4 Arrays:

More information

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches CS 152 Computer Architecture and Engineering Lecture 11 - Virtual Memory and Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Virtual Memory. Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Virtual Memory. Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK] Virtual Memory Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] Virtual Memory Usemain memory asa cache a for secondarymemory

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information