EC 513 Computer Architecture
|
|
- Earl Nichols
- 5 years ago
- Views:
Transcription
1 EC 513 Computer Architecture Cache Organization Prof. Michel A. Kinsy
2 The course has 4 modules Module 1 Instruction Set Architecture (ISA) Simple Pipelining and Hazards Module 2 Superscalar Architectures Vector machines VLIW Multithreading GPU Module 3 Branch Prediction Caches Memory Models & Synchronization Cache Coherence Protocols Module 4 On-Chip networks On-chip Network routing
3 Architecture Taxonomy Processor Organizations Single instruction, single data stream (SISD) Single instruction multiple data stream (SIMD) Multiple instruction, single data stream (MISD) Multiple instruction, multiple data stream (MIMD) Uniprocessor Vector Processor Array Processor Shared Memory (Tightly Coupled) Distributed Memory (Loosely Coupled Symmetric Multiprocessor (SMP) Nonuniformed Memory Access (NUMA) Cluster Parallelism Paradigms: Instruction level, Data level and Task level Parallelisms
4 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency (time for a single access) Memory access time >> Processor cycle time Bandwidth (number of accesses per unit time) if fraction m of instructions access memory, 1+m memory references / instruction Ghost of the stored-program architecture CPI = 1 requires 1+m memory refs / cycle
5 Processor- Memory Gap Performance gap: CPU (55% each year) vs. DRAM (7% each year) Processor operations take of the order of 1 ns Memory access requires 10s or even 100s of ns Each instruction executed involves at least one memory access Performance µproc 60%/year Moore s Law DRAM CPU Processor-Memory Performance Gap: (grows 50% / year) DRAM 7%/year Time [From David Patterson, UC Berkeley]
6 Processor-DRAM Gap (latency) Four-issue 2GHz superscalar accessing 100ns DRAM could execute 800 instructions during time for one memory access! Performance µproc 60%/year Moore s Law CPU Processor-Memory Performance Gap: (grows 50% / year) DRAM 7%/year 1 DRAM Time [From David Patterson, UC Berkeley]
7 Memory Trends The fastest memories are expensive and thus not very large Capacity Access Time Cost (per GB) 100s B ns $Millions 10s KB few ns $100s Ks Reg L1 $ 4-8 bytes (word) MBs 10s ns $10s Ks 100s MB 100s ns $1000s 10s GB 10s ms $10s Ln $ 8-32 bytes (block) 1 to 4 blocks Main Memory 1,024+ bytes (disk sector = page) Secondary Memory
8 Illustrative View of Memory Organization A fast memory can help bridge the CPU-memory gap The fastest memories are expensive and thus not very large Control On-Chip Components Datapath ALU RegFile Instr Cache Data Cache Second Level Cache (SRAM) Main Memory (DRAM) Secondary Memory (Disk)
9 Intel Core i7 Organization
10 Intel Haswell
11 Memory Technology Early machines used a variety of memory technologies Manchester Mark I used CRT Memory Storage EDVAC used a mercury delay line Core memory was first large scale reliable main memory Invented by Forrester in late 40s at MIT for Whirlwind project Bits stored as magnetization polarity on small ferrite cores threaded onto 2 dimensional grid of wires
12 Memory Technology First commercial DRAM was Intel Kbit of storage on single chip Charge on a capacitor used to hold value Semiconductor memory quickly replaced core in 1970s Intel formed to exploit market for semiconductor memory Phase change memory (PCM) looking promising for the future Slightly slower, but much denser than DRAM and non-volatile
13 Memory Technology Random Access Memory (RAM) Any byte of memory can be accessed without touching the preceding bytes RAM is the most common type of memory found in computers and other digital devices There are two main types of RAM DRAM (Dynamic Random Access Memory) Needs to be refreshed regularly (~ every 8 ms) 1% to 2% of the active cycles of the DRAM Used for Main Memory SRAM (Static Random Access Memory)
14 Memory Technology Random Access Memory (RAM) Any byte of memory can be accessed without touching the preceding bytes RAM is the most common type of memory found in computers and other digital devices There are two main types of RAM DRAM (Dynamic Random Access Memory) SRAM (Static Random Access Memory) Content will last until power turned off Low density (6 transistor cells), high power, expensive, fast Used for caches
15 RAM Organization One memory row holds a block of data, so the column address selects the requested bit or word from that block Col. 1 bit lines Col. Row Address Decoder 2 M Row 1 word lines N+M N M Column Decoder & Sense Amplifiers Row 2 N Memory cell (one bit) Data D
16 DRAM Architecture Col. 1 bit lines Col. Row Address Decoder 2 M Row 1 word lines N+M N M Column Decoder & Sense Amplifiers Row 2 N Memory cell (one bit) Data D Modern chips have around 4 logical banks on each chip Each logical bank physically implemented as many smaller arrays
17 RAM Organization One memory row holds a block of data, so the column address selects the requested bit or word from that block RAS or Row Access Strobe triggering row decoder CAS or Column Access Strobe triggering column selector Address 21 Chip select Output enable Write enable SRAM 2M x Dout[15-0] Din[15-0] 16
18 RAM Organization Latency: Time to access one word Access time: time between the request and when the data is available (or written) Cycle time: time between requests Usually cycle time > access time Bandwidth: How much data from the memory can be supplied to the processor per unit time Width of the data channel * The rate at which it can be used
19 Typical Memory Reference Patterns Address n loop iterations Instruction fetches Stack accesses subroutine call subroutine return argument access Data accesses scalar accesses Time
20 A Typical Memory Hierarchy CPU RF Split instruction & data primary caches (on-chip SRAM) L1 Instructio n Cache L1 Data Cache Unified L2 Cache Multiple interleaved memory banks (off-chip DRAM) Memory Memory Memory Memory DMA Disks/External Memory/Devices/ Others Multi-ported register file (part of CPU) Large unified secondary cache (on-chip SRAM)
21 Definition of a Cache A cache is simply a copy of a small data segment residing in the main memory Fast but small extra memory Hold identical copies of main memory Lower latency Higher bandwidth Usually several levels (1, 2 and 3)
22 Cache Structures Processor Address Data CACHE Address Data Main Memory Data Byte Data Byte Data Byte Address Tag
23 Caching & Cache Structures Processor Address Data CACHE Address Data Main Memory copy of main memory location 100 copy of main memory location Data Byte Data Byte Data Byte Address Tag
24 Caching & Cache Structures Processor Address Data CACHE Address Data Main Memory copy of main memory location 100 copy of main memory location Data Byte Data Byte Data Byte Line Address Tag Data Block
25 Multilevel Caches Cache is transparent to user (happens automatically) Line Word CPU Reg File Cache Memory Main Memory Data is in the cache fraction h of the time Go to main 1 h of the time
26 Multilevel Caches Cache is transparent to user (happens automatically) Line Word CPU Reg File Cache Memory Main Memory Data is in the cache fraction h of the time For a cache with hit rate h, effective access time is: Go to main 1 h of the time C eff = hc fast + (1 h)(c slow + C fast ) = C fast + (1 h)c slow
27 Caches This organization works because most programs exhibit locality The principle of temporal locality says that if a program accesses one memory address, there is a good chance that it will access the same address in the near future The principle of spatial locality says that if a program accesses one memory address, there is a good chance that it will also access other nearby addresses CPU L1 L2 DRAM
28 Caching Principles Cache contains copies of some of Main Memory Those storage locations recently used When Main Memory address A is referenced in CPU Cache checked for a copy of contents of A If found, cache hit Copy used No need to access Main Memory If not found, cache miss Main Memory accessed to get contents of A Copy of contents also loaded into cache
29 Caching principles Cache size (in bytes or words) Total cache capacity A larger cache can hold more of the program s useful data but is more costly and likely to be slower Block or cache-line size Unit of data transfer between cache and main With a larger cache line, more data is brought in cache with each miss. This can improve the hit rate but also may bring low-utility data in cache
30 Placement policy Caching principles Determining where an incoming cache line is stored More flexible policies imply higher hardware cost and may or may not have performance benefits (due to more complex data location) Replacement policy Determining which of several existing cache blocks (into which a new cache line can be mapped) should be overwritten Typical policies: choosing a random or the least recently used block
31 Compulsory misses Caching Principles With on-demand fetching, first access to any item is a miss Capacity misses We have to evict some items to make room for others This leads to misses that are not incurred with an infinitely large cache Conflict misses The placement scheme may force us to displace useful items to bring in other items This may lead to misses in future
32 Line width (2 W ) Caching principles Too small a value for W causes a lot of main memory accesses Too large a value increases the miss penalty and may tie up cache space with low-utility items that are replaced before being used Set size or associativity (2 S ) Direct mapping (S = 0) is simple and fast Greater associativity leads to more complexity, and thus slower access, but tends to reduce conflict misses
33 Cache Algorithm (Read) Look at Processor Address, search cache tags to find match. Then either Found in cache a.k.a. HIT Return copy of data from cache Q: Which line do we replace? Not in cache a.k.a. MISS Read block of data from Main Memory Wait Return data to processor and update cache
34 Caches Local miss rate = misses in cache / accesses to cache Global miss rate = misses in cache / CPU memory accesses Misses per instruction = misses in cache / number of instructions CPU L1 L2 DRAM
35 Cache Performance Metrics Cache miss rate Number of cache misses divided by number of accesses Cache hit time Time between sending address and data returning from cache Cache miss latency Time between sending address and data returning from next-level cache/memory Cache miss penalty Extra processor stall caused by next-level cache/memory access 35
36 Average Memory Access Time Average Memory Access Time (AMAT) AMAT = Hit time + (Miss rate x Miss penalty) Memory stall cycles = Memory accesses x miss rate x miss penalty CPU time = (CPU execution cycles + Memory stall cycles) x Cycle time CPI = ideal CPI + average stalls per instruction Having L1 and L2 Caches AMAT = Hit Time L1 + Miss Rate L1 x Miss Penalty L1 Miss Penalty L1 = Hit Time L2 + Miss Rate L2 x Miss Penalty L2 AMAT = Hit Time L1 + Miss Rate L1 x (Hit Time L2 + Miss Rate L2 Miss Penalty L2 )
37 Placement Policy Block Number Memory Set Number Cache Block 12 can be placed Fully (2-way) Set Direct Associative Associative Mapped anywhere anywhere in only into set 0 block 4 (12 mod 4) (12 mod 8)
38 Address Bit-Field Partitioning The address (e.g., 32-bit) issued by the CPU is generally divided into 3 fields Tag Serves as the unique identifier for a group of data Different regions of memory may be mapped to the same cache location/block The tag is used to differentiate between them Index It is used to index into the cache structure Block Offset The least significant bits are used to determine the exact data word If the block size is B then b = log 2 B bits will be needed in the address to specify data word Address Tag Index Block Offset t bits k bits b bits
39 Direct-Mapped Cache Tag Index Block Offset t V Tag k Data Block b 2 k lines t = HIT Data Word or Byte
40 Direct Map Address Selection Index Tag Block Offset k V Tag t Data Block b 2 k lines t = HIT Data Word or Byte
41 Hashed Address Selection Address t Block Offset V Tag Data Block b Hash 2 k lines t = HIT Data Word or Byte
42 2-Way Set-Associative Cache Tag Index Block Offset b t V Tag k Data Block V Tag Data Block t = = Data Word or Byte HIT
43 Fully Associative Cache V Tag Data Block t = Block Offset Tag b t = = Data Word or Byte HIT
44 Write Performance t Tag V Tag Index k Block Offset b Data 2 k lines t = WE HIT Data Word or Byte
45 Improving Cache Performance Average memory access time = To improve performance: Reduce the hit time Hit time + Miss rate x Miss penalty Reduce the miss rate (e.g., larger cache) Reduce the miss penalty (e.g., L2 cache) What is the simplest design strategy? Biggest cache that doesn t increase hit time past 1-2 cycles (approx 8-32KB in modern technology)
46 Effect of Cache on Performance Larger cache size Reduces conflict misses Hit time will increase Higher associativity Reduces conflict misses May increase hit time Larger block size Reduces compulsory misses Exploit burst transfers in memory and on buses Increases miss penalty and conflict misses
47 Replacement Policy Which block from a set should be evicted? Random Least Recently Used (LRU) LRU cache state must be updated on every access True implementation only feasible for small sets (2-way) Pseudo-LRU binary tree often used for 4-8 way First In, First Out (FIFO) a.k.a. Round-Robin Used in highly associative caches Not Least Recently Used (NLRU) FIFO with exception for most recently used block or blocks
48 Reducing Write Hit Time Problem: Writes take two cycles in memory stage, one cycle for tag check plus one cycle for data write if hit Solutions Design data RAM that can perform read and write in one cycle, restore old value after tag miss Fully-associative (CAM Tag) caches: Word line only enabled if hit Pipelined writes: Hold write data for store in single buffer ahead of cache, write cache data during next store s tag check
49 Victim Caches CPU RF L1 Data Cache Unified L2 Cache Evicted data from L1 Hit data (miss in L1) Victim Cache FA, 4 blocks where? Evicted data from VC Victim cache is a small associative back up cache, added to a direct
50 Victim Caches Victim cache is a small associative back up cache, added to a direct Mapped cache, which holds recently evicted lines 1. First look up in direct mapped cache 2. If miss, look in victim cache 3. If hit in victim cache, swap hit line with line now evicted from L1 4. If miss in victim cache, L1 victim -> VC, VC victim->? Fast hit time of direct mapped but with reduced conflict misses
51 Delayed Write Timing Time LD 0 ST 1 Tag LD 0 ST 1 ST 2 LD 3 ST 4 LD 5 ST 2 LD 3 Data LD 0 ST 1 LD 3 ST 2 LD 5 ST 4 LD 5 Buffer ST 1 ST 2 ST 2 ST 4 ST 4
52 Pipelining Cache Writes Data from a store hit written into data portion of cache during tag access of subsequent store Address and Store Data From CPU Tag Index Store Data Delayed Write Addr. =? Load/Store Delayed Write Data Tags S L Data =? 1 0 Hit? Load Data to CPU
53 Cache hit: Write through Write Policy Choices Write both cache & memory Generally higher traffic but simplifies cache coherence Write back Write cache only (memory is written only when the entry is evicted) A dirty bit per block can further reduce the traffic
54 Cache miss: Write Policy Choices No write allocate: only write to main memory Write allocate (aka fetch on write): fetch into cache Common combinations: Write through and no write allocate Write back with write allocate
55 Reducing Read Miss Penalty CPU RF Data Cache Write buffer Unified L2 Cache Evicted dirty lines for writeback cache OR All writes in writethru cache
56 Reducing Read Miss Penalty Problem: Write buffer may hold updated value of location needed by a read miss RAW data hazard Stall: On a read miss, wait for the write buffer to go empty Bypass: Check write buffer addresses against read miss addresses, if no match, allow read miss to go ahead of writes, else, return value in write buffer
57 Prefetching Speculate on future instruction and data accesses and fetch them into cache(s) Instruction accesses easier to predict than data accesses Varieties of prefetching Hardware prefetching Software prefetching Mixed schemes What types of misses does prefetching affect?
58 Issues in Prefetching Usefulness should produce hits Timeliness not late and not too early Cache and bandwidth pollution Most recent Security / side-channel issues CPU L1 Instruction Unified L2 Cache RF L1 Data Prefetched data
59 Hardware Instruction Prefetching Instruction prefetch in Alpha AXP Fetch two blocks on a miss; the requested block (i) and the next consecutive block (i+1) Requested block placed in cache, and next block in instruction stream buffer If miss in cache but hit in stream buffer, move stream buffer block into cache and prefetch next block (i+2) Req block Stream Buffer Prefetched instruction block CPU RF L1 Instruction Req block Unified L2 Cache
60 Hardware Data Prefetching Prefetch-on-miss: Prefetch b + 1 upon miss on b One Block Lookahead (OBL) scheme Initiate prefetch for block b + 1 when block b is accessed Why is this different from doubling block size? Can extend to N block lookahead Strided prefetch If observe sequence of accesses to block b, b+n, b+2n, then prefetch b+3n etc.
61 Itanium-2 On-Chip Caches Level 1, 16KB, 4-way s.a., 64B line, quad-port (2 load+2 store), single cycle latency Level 2, 256KB, 4-way s.a,128b line, quad-port (4 load or 4 store), five cycle latency Level 3, 3MB, 12-way s.a.,128b line, single 32B port, twelve cycle latency
62 Next Class Advanced Memory Operations
Page 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationLecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw
More informationLecture-14 (Memory Hierarchy) CS422-Spring
Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationCS 152 Computer Architecture and Engineering. Lecture 6 - Memory
CS 152 Computer Architecture and Engineering Lecture 6 - Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste! http://inst.eecs.berkeley.edu/~cs152!
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationLecture 11 Cache. Peng Liu.
Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative
More informationAgenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File
EE 260: Introduction to Digital Design Technology Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa 2 Technology Naive Register File Write Read clk Decoder Read Write 3 4 Arrays:
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationCENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 13 Memory Part 2
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 13 Memory Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COEN-4710 Computer Hardware Lecture 7 Large and Fast: Exploiting Memory Hierarchy (Chapter 5) Cristinel Ababei Marquette University Department
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016
Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss
More informationLecture 6 - Memory. Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory
CS 152 Computer Architecture and Engineering Lecture 6 - Memory Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152
More informationMemory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt
Memory Hierarchy 2/18/2016 CS 152 Sec6on 5 Colin Schmidt Agenda Review Memory Hierarchy Lab 2 Ques6ons Return Quiz 1 Latencies Comparison Numbers L1 Cache 0.5 ns L2 Cache 7 ns 14x L1 cache Main Memory
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Components of a Computer Processor
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationEE 660: Computer Architecture Advanced Caches
EE 660: Computer Architecture Advanced Caches Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David Wentzlaff Agenda Review Three C s Basic Cache
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017
Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the
More informationMemory Hierarchy Y. K. Malaiya
Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath
More informationECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 13 Memory Part 2
ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 13 Memory Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010 Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could
More informationThe Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)
The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]
ECE7995 (4) Basics of Memory Hierarchy [Adapted from Mary Jane Irwin s slides (PSU)] Major Components of a Computer Processor Devices Control Memory Input Datapath Output Performance Processor-Memory Performance
More informationWhy memory hierarchy? Memory hierarchy. Memory hierarchy goals. CS2410: Computer Architecture. L1 cache design. Sangyeun Cho
Why memory hierarchy? L1 cache design Sangyeun Cho Computer Science Department Memory hierarchy Memory hierarchy goals Smaller Faster More expensive per byte CPU Regs L1 cache L2 cache SRAM SRAM To provide
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationCSC 631: High-Performance Computer Architecture
CSC 631: High-Performance Computer Architecture Spring 2017 Lecture 10: Memory Part II CSC 631: High-Performance Computer Architecture 1 Two predictable properties of memory references: Temporal Locality:
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III
CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationThe Memory Hierarchy & Cache
Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationCS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches
CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III
CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationThe Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):
The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationLECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY
LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal
More information1/19/2009. Data Locality. Exploiting Locality: Caches
Spring 2009 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic Data Locality Temporal: if data item needed now, it is likely to be needed again in near future Spatial: if data item needed now, nearby
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationIntroduction to cache memories
Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationSystems Programming and Computer Architecture ( ) Timothy Roscoe
Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 AS 2016 Caches 1 16: Caches Computer Architecture
More informationChapter Seven Morgan Kaufmann Publishers
Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationChapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)
Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)
More informationMemory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationMemory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky
Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationPortland State University ECE 587/687. Caches and Memory-Level Parallelism
Portland State University ECE 587/687 Caches and Memory-Level Parallelism Revisiting Processor Performance Program Execution Time = (CPU clock cycles + Memory stall cycles) x clock cycle time For each
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Static RAM (SRAM) Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 0.5ns 2.5ns, $2000 $5000 per GB 5.1 Introduction Memory Technology 5ms
More informationCaching Basics. Memory Hierarchies
Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby
More informationThe Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.
The Hierarchical Memory System The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory Hierarchy:
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationCHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN
CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 11: Memory
CS252 Spring 2017 Graduate Computer Architecture Lecture 11: Memory Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Logistics for the 15-min meeting next Tuesday Email
More informationRecap: Machine Organization
ECE232: Hardware Organization and Design Part 14: Hierarchy Chapter 5 (4 th edition), 7 (3 rd edition) http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy,
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III
CS 152 Computer Architecture and Engineering Lecture 8 - Memory Hierarchy-III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationLecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University
Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27
More informationMemory Hierarchy and Caches
Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline
More informationTDT 4260 lecture 3 spring semester 2015
1 TDT 4260 lecture 3 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU http://research.idi.ntnu.no/multicore 2 Lecture overview Repetition Chap.1: Performance,
More informationECE468 Computer Organization and Architecture. Memory Hierarchy
ECE468 Computer Organization and Architecture Hierarchy ECE468 memory.1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic:
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationLecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )
Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case
More information3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationCOSC4201. Chapter 5. Memory Hierarchy Design. Prof. Mokhtar Aboelaze York University
COSC4201 Chapter 5 Memory Hierarchy Design Prof. Mokhtar Aboelaze York University 1 Memory Hierarchy The gap between CPU performance and main memory has been widening with higher performance CPUs creating
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationComputer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg
Computer Architecture and System Software Lecture 09: Memory Hierarchy Instructor: Rob Bergen Applied Computer Science University of Winnipeg Announcements Midterm returned + solutions in class today SSD
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationCS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.
CS 33 Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. Hyper Threading Instruction Control Instruction Control Retirement Unit
More informationCENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationCS3350B Computer Architecture
CS3350B Computer Architecture Winter 2015 Lecture 3.1: Memory Hierarchy: What and Why? Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson
More informationCPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?
cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example
More information