Who Cares About the Memory Hierarchy? Time CS 152 Lec16.6. Performance. Recap

Size: px
Start display at page:

Download "Who Cares About the Memory Hierarchy? Time CS 152 Lec16.6. Performance. Recap"

Transcription

1 CS 152 Lec16.1 Recap CS152: Computer rchitecture and Engineering Lecture 16: Locality and Technology MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load) More performance from deeper pipelines, parallelism Increasing length of pipe increases impact of hazards; pipelining helps instruction bandwidth, not latency SW Pipelining Symbolic Loop Unrolling to get most from pipeline with little code expansion, little overhead ynamic Branch Prediction + early branch address for speculative execution Superscalar and VLIW CPI < 1 CS 152 Lec16.2 ynamic issue vs. Static issue More instructions issue at same time, larger the penalty of hazards Intel EPIC in I-64 a hybrid: compact LIW + data hazard check The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control atapath Today s Topics: Locality and Hierarchy Technology RM Technology Organization Input Output Technology Trends (from 1st lecture) Capacity Speed (latency) Logic: 2x in 3 years 2x in 3 years RM: 4x in 3 years 2x in 10 years isk: 4x in 3 years 2x in 10 years RM Year Size Cycle 1000:1! 2:1! Kb 250 ns Kb 220 ns Mb 190 ns Mb 165 ns Mb 145 ns Mb 120 ns CS 152 Lec16.3 CS 152 Lec16.4 Performance Who Cares bout the Hierarchy? 1000 Processor-RM Gap (latency) µproc CPU 60%/yr. Moore s Law (2X/1.5yr) Processor- Performance Gap: (grows 50% / year) RM RM 9%/yr. (2X/10 yrs) Today s Situation: Microprocessor Rely on caches to bridge gap Microprocessor-RM performance gap time of a full cache miss in instructions executed 1st lpha (7000): 340 ns/5.0 ns = 68 clks x 2 or 136 instructions 2nd lpha (8400): 266 ns/3.3 ns = 80 clks x 4 or 320 instructions 3rd lpha (t.b.d.): 180 ns/1.7 ns =108 clks x 6 or 648 instructions 1/2X latency x 3X clock rate x 3X Instr/clock ~5X CS 152 Lec16.5 CS 152 Lec16.6

2 CS 152 Lec16.7 Impact on Performance Suppose a processor executes at Clock Rate = 200 MHz (5 ns per cycle) CPI = % arith/logic, 30% ld/st, 20% control Suppose that 10% of memory operations get 50 cycle miss penalty atamiss (1.6) 49% CPI = ideal CPI + average stalls per instruction = 1.1(cyc) +( 0.30 (datamops/ins) x 0.10 (miss/datamop) x 50 (cycle/miss) ) = 1.1 cycle cycle = % of the time the processor is stalled waiting for memory! Inst Miss (0.5) 16% a 1% instruction miss rate would add an additional 0.5 cycles to the CPI! Ideal CPI (1.1) 35% CS 152 Lec16.8 The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and fast (most of the time)? Hierarchy Parallelism n Expanded View of the System Why hierarchy works Processor Control The Principle of Locality: Program access a relatively small portion of the address space at any instant of time. Probability of reference atapath 0 ddress Space 2^n - 1 Speed: Size: Cost: Fastest Smallest Highest Slowest Biggest Lowest CS 152 Lec16.9 CS 152 Lec16.10 Hierarchy: How oes it Work? Hierarchy: Terminology Temporal Locality (Locality in ): => Keep most recently accessed data items closer to the processor Spatial Locality (Locality in Space): => Move blocks which consist of contiguous words to the upper levels To Processor From Processor Upper Level Blk X Lower Level Blk Y Hit: data appears in some block in the upper level (example: Block X) Hit Rate: the fraction of memory access found in the upper level Hit : to access the upper level which consists of RM access time + to determine hit/miss Miss: data needs to be retrieved from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty: to replace a block in the upper level + to deliver the block the processor Hit << Miss Penalty To Processor From Processor Upper Level Blk X Lower Level Blk Y CS 152 Lec16.11 CS 152 Lec16.12

3 CS 152 Lec16.13 Hierarchy of a Modern Computer System By taking advantage of the principle of locality: Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. atapath Processor Control Registers On-Chip Cache Second Level Cache () Main (RM) Secondary Storage (isk) Tertiary Storage (isk) How is the hierarchy managed? Registers <-> by compiler (programmer?) cache <-> memory by the hardware memory <-> disks by the hardware and operating system (virtual memory) by the programmer (files) Speed (ns): 1s 10s 100s Size (bytes): 100s Ks Ms 10,000,000s (10s ms) Gs 10,000,000,000s (10s sec) Ts CS 152 Lec16.14 Hierarchy Technology Random ccess: Random is good: access time is the same for all locations RM: ynamic Random ccess - High density, low power, cheap, slow - ynamic: need to be refreshed regularly : Static Random ccess - Low density, high power, expensive, fast - Static: content will last forever (until lose power) Non-so-random ccess Technology: ccess time varies from location to location and from time to time Examples: isk, CROM Sequential ccess Technology: access time linear in location (e.g.,tape) The next two lectures will concentrate on random access technology The Main : RMs + Caches: s Main Background Performance of Main : Latency: Cache Miss Penalty - ccess : time between request and word arrives - Cycle : time between requests Bandwidth: & Large Block Miss Penalty (L2) Main is RM: ynamic Random ccess ynamic since needs to be refreshed periodically (8 ms) ddresses divided into 2 halves ( as a 2 matrix): - RS or Row ccess Strobe - CS or Column ccess Strobe Cache uses : Static Random ccess No refresh (6 transistors/ vs. 1 transistor) Size: RM/ 4-8, Cost/Cycle time: /RM 8-16 CS 152 Lec16.15 CS 152 Lec16.16 Random ccess (RM) Technology Static RM Why do computer designers need to know about RM technology? Processor performance is usually limited by memory bandwidth s IC densities increase, lots of memory will fit on processor chip - Tailor on-chip memory to specific needs - Instruction cache - ata cache - Write buffer What makes RM different from a bunch of flip-flops? ensity: RM is much more denser 6-Transistor Write: 1. rive lines (=1, =0) 2.. Select row word (row select) Read: 1. Precharge and to Vdd 2.. Select row 3. pulls one line low 1 word replaced with pullup to save area 4. Sense amp on column detects difference between and CS 152 Lec16.17 CS 152 Lec16.18

4 CS 152 Lec16.19 Typical Organization: 16-word x 4- in 3 Wr river & Wr river & Wr river & Wr river & - Precharger+ - Precharger+ - Precharger+ - Precharger+ : : : : - Sense mp + - Sense mp + - Sense mp + - Sense mp + out 3 in 2 out 2 in 1 out 1 in 0 out 0 Precharge Word 0 Word 1 Word 15 ddress ecoder WrEn Logic iagram of a Typical Write Enable is usually active low () in and out are combined to save pins: new control signal, output enable () is needed is asserted (Low), is disasserted (High) - serves as the data input pin is disasserted (High), is asserted (Low) - is the data output pin Both and are asserted: - Result is unknown. on t do that!!! lthough could change VHL to do whatever you desire, must do the best with what you ve got (vs. what you need) CS 152 Lec16.20 N 2 N words x M M Typical Timing Problems with Write Timing: ata In Write ddress N Write Setup High Z 2 N words x M Write Hold M Read Timing: Junk Read ddress Read ccess ata Out Read ddress Read ccess ata Out On Select = 1 Off On On Off N1 N2 = 1 = 0 Six transistors use up a lot of area Consider a Zero is stored in the cell: Transistor N1 will try to pull to 0 Transistor P2 will try to pull bar to 1 P1 But lines are precharged high: re P1 and P2 necessary? P2 On CS 152 Lec16.21 CS 152 Lec Transistor (RM) Classical RM Organization (square) (data) lines Write: 1. rive line 2.. Select row Read: 1. Precharge line to Vdd 2.. Select row 3. and line share charges - Very small voltage changes on the line 4. Sense (fancy sense amp) - Can detect changes of ~10-100k electrons 5. Write: restore the value row select r o w d e c o d e r row address RM rray Column Selector & Circuits Each intersection represents a 1-T RM word (row) select Column ddress Refresh 1. Just do a dummy read to every cell. CS 152 Lec16.23 CS 152 Lec16.24 data Row and Column ddress together: Select 1 a time

5 CS 152 Lec16.25 RM logical organization (4 M) RM physical organization (4 M) Column ecoder 11 Sense mps & Column ddress 8 s 0 10 rray (2,048 x 2,048) Q Row ddress Block Row ec. 9 : 512 Block Row ec. 9 : 512 Block Row ec. 9 : 512 Block Row ec. 9 : Q Square root of s per RS/CS Word Line Storage Block 0 Block 3 8 s CS 152 Lec16.26 Systems Logic iagram of a Typical RM RS_L CS_L address n RM Controller Timing Controller Tc = Tcycle + Tcontroller + Tdriver RM 2^n x 1 chip w Bus rivers 256K x 8 9 RM 8 Control Signals (RS_L, CS_L,, ) are all active low in and out are combined (): is asserted (Low), is disasserted (High) - serves as the data input pin is disasserted (High), is asserted (Low) - is the data output pin Row and column addresses share the same pins () RS_L goes low: Pins are latched in as row address CS_L goes low: Pins are latched in as column address RS/CS edge-sensitive CS 152 Lec16.27 CS 152 Lec16.28 Key RM Timing Parameters t RC : minimum time from RS line falling to the valid data output. Quoted as the speed of a RM fast 4Mb RM t RC = 60 ns t RC : minimum time from the start of one row access to the start of the next. t RC = 110 ns for a 4M RM with a t RC of 60 ns t CC : minimum time from CS line falling to valid data output. 15 ns for a 4M RM with a t RC of 60 ns t PC : minimum time from the start of one column access to the start of the next. 35 ns for a 4M RM with a t RC of 60 ns RM Performance 60 ns (t RC ) RM can perform a row access only every 110 ns (t RC ) perform column access (t CC ) in 15 ns, but time between column accesses is at least 35 ns (t PC ). - In practice, external address delays and turning around buses make it 40 to 50 ns These times do not include the time to drive the addresses off the microprocessor nor the memory controller overhead. rive parallel RMs, external memory controller, bus to turn around, SIMM module, pins 180 ns to 250 ns latency from processor to memory is good for a 60 ns (t RC ) RM CS 152 Lec16.29 CS 152 Lec16.30

6 CS 152 Lec16.31 RM Write Timing RM Read Timing Every RM access begins at: The assertion of the RS_L 2 ways to write: early or late v. CS RM WR Cycle RS_L CS_L 256K x 8 9 RM 8 Every RM access begins at: The assertion of the RS_L 2 ways to read: early or late v. CS RM Read Cycle RS_L CS_L 256K x 8 9 RM 8 RS_L RS_L CS_L CS_L Row ddress Col ddress Junk Row ddress Col ddress Junk Row ddress Col ddress Junk Row ddress Col ddress Junk Junk ata In Junk ata In Junk WR ccess Early Wr Cycle: asserted before CS_L WR ccess Late Wr Cycle: asserted after CS_L High Z Junk ata Out High Z ata Out CS 152 Lec16.32 Read ccess Early Read Cycle: asserted before CS_L Output Enable elay Late Read Cycle: asserted after CS_L Main Performance Cycle versus ccess Cycle Simple: CPU, Cache, Bus, same width (32 s) Wide: CPU/Mux 1 word; Mux/Cache, Bus, N words (lpha: 64 s & 256 s) Interleaved: CPU, Cache, Bus 1 word: N Modules (4 Modules); example is word interleaved ccess RM (Read/Write) Cycle >> RM (Read/Write) ccess 2:1; why? RM (Read/Write) Cycle : How frequently can you initiate an access? RM (Read/Write) ccess : How quickly will you get what you want once you initiate an access? RM Bandwidth Limitation : How much data can you get from the memory? CS 152 Lec16.33 CS 152 Lec16.34 Increasing Bandwidth - Interleaving Main Performance ccess Pattern without Interleaving: 1 available Start ccess for 1 ccess Pattern with 4-way Interleaving: ccess Bank 0 CS 152 Lec16.35 Start ccess for 2 ccess Bank 1 ccess Bank 2 ccess Bank 3 We can ccess Bank 0 again CPU CPU Bank 0 Bank 1 Bank 2 Bank 3 CS 152 Lec16.36 Timing to access one 4 word cache block 1 to send address, 6 access time, 1 to send data Cache Block is 4 words Simple M.P. = 4 x (1+6+1) = 32 Wide M.P. = = 8 Interleaved M.P. = x1 = 11

7 CS 152 Lec16.37 Independent Banks How many banks? number banks = number clocks to access word in bank For sequential accesses, otherwise will return to original bank before it has next word ready Increasing RM => fewer chips => harder to have banks Growth s/chip RM : 50%-60%/yr Nathan Myrvold M/S: mature software growth (33%/yr for NT) growth MB/$ of RM (25%-30%/yr) (from Pete MacWilliams, Intel) Minimum PC Size CS 152 Lec16.38 Fewer RMs/System over 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB RM Generation Mb 4 Mb 16 Mb 64 Mb 256 Mb 1 Gb 32 8 per 16 4 RM 60% / year 8 2 per System 25%-30% / year Page Mode RM: Motivation Fast Page Mode Operation Regular RM Organization: N rows x N column x M- Read & Write M- at a time Each M- access requires a RS / CS cycle Fast Page Mode RM N x M register to save a row Column ddress N rows N cols RM M- Output M s Row ddress Fast Page Mode RM N x M to save a row fter a row is read into the register Only CS is needed to access other M- blocks on that row RS_L remains asserted while CS_L is toggled Column ddress N rows M- Output N cols RM N x M M s Row ddress RS_L 1st M- ccess 2nd M- ccess RS_L 1st M- ccess 2nd M- 3rd M- 4th M- CS_L CS_L Row ddress Col ddress Junk Row ddress Col ddress Junk Row ddress Col ddress Col ddress Col ddress Col ddress CS 152 Lec16.39 CS 152 Lec16.40 RM v. esktop Microprocessor Cultures RM esign Goals Standards pinout, package, refresh rate, binary compatibility, IEEE 754, bus capacity,... Sources Multiple Single Figures 1) capacity, 2) $/ 1) SPEC speed of Merit 3) BW, 4) latency 2) cost Improve 1) 60%, 2) 25%, 1) 60%, Rate/year 3) 20%, 4) 7% 2) little change Reduce cell size 2.5, increase die size 1.5 Sell 10% of a single RM generation 6.25 billion RMs sold in phases: engineering samples, first customer ship(fcs), mass production Fastest to FCS, mass production wins share ie size, testing time, yield => profit Yield >> 60% (redundant rows/columns to repair flaws) CS 152 Lec16.41 CS 152 Lec16.42

8 CS 152 Lec16.43 RM History Summary: RMs: capacity +60%/yr, cost 30%/yr 2.5X cells/area, 1.5X die size in 3 years 97 RM fab line costs $1B to $2B RM only: density, leakage v. speed Rely on increasing no. of computers & memory per computer (60% market) SIMM or IMM is replaceable unit => computers use any generation RM Commodity, second source industry => high volume, low profit, conservative Little organization innovation in 20 years page mode, EO, Synch RM Order of importance: 1) Cost/ 2) Capacity RMBUS: 10X BW, cost increase?? Two ifferent Types of Locality: Temporal Locality (Locality in ): If an item is referenced, it will tend to be referenced again soon. CS 152 Lec16.44 Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon. By taking advantage of the principle of locality: Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. RM is slow but cheap and dense: Good choice for presenting the user with a BIG memory system is fast but expensive and not very dense: Good choice for providing the user FST access time. Summary: Processor- Performance Gap Tax Processor % rea %Transistors (~cost) (~power) lpha % 77% Strongrm S110 61% 94% Pentium Pro 64% 88% 2 dies per package: Proc/I$/$ + L2$ Caches have no inherent value, only try to close performance gap CS 152 Lec16.45

EEC 483 Computer Organization

EEC 483 Computer Organization EEC 483 Computer Organization Chapter 5 Large and Fast: Exploiting Memory Hierarchy Chansu Yu Table of Contents Ch.1 Introduction Ch. 2 Instruction: Machine Language Ch. 3-4 CPU Implementation Ch. 5 Cache

More information

CpE 442. Memory System

CpE 442. Memory System CpE 442 Memory System CPE 442 memory.1 Outline of Today s Lecture Recap and Introduction (5 minutes) Memory System: the BIG Picture? (15 minutes) Memory Technology: SRAM and Register File (25 minutes)

More information

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer

Memory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Datapath Today s Topics: technologies Technology trends Impact on performance Hierarchy The principle of locality

More information

CS152 Computer Architecture and Engineering Lecture 16: Memory System

CS152 Computer Architecture and Engineering Lecture 16: Memory System CS152 Computer Architecture and Engineering Lecture 16: System March 15, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http://http.cs.berkeley.edu/~patterson

More information

ECE468 Computer Organization and Architecture. Memory Hierarchy

ECE468 Computer Organization and Architecture. Memory Hierarchy ECE468 Computer Organization and Architecture Hierarchy ECE468 memory.1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic:

More information

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now? cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example

More information

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (4) Basics of Memory Hierarchy [Adapted from Mary Jane Irwin s slides (PSU)] Major Components of a Computer Processor Devices Control Memory Input Datapath Output Performance Processor-Memory Performance

More information

Memories: Memory Technology

Memories: Memory Technology Memories: Memory Technology Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Memory Hierarchy

More information

CSC Memory System. A. A Hierarchy and Driving Forces

CSC Memory System. A. A Hierarchy and Driving Forces CSC1016 1. System A. A Hierarchy and Driving Forces 1A_1 The Big Picture: The Five Classic Components of a Computer Processor Input Control Datapath Output Topics: Motivation for Hierarchy View of Hierarchy

More information

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System

More information

CENG3420 Lecture 08: Memory Organization

CENG3420 Lecture 08: Memory Organization CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48 Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Ideally, computer memory would be large and fast

More information

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics ,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics The objectives of this module are to discuss about the need for a hierarchical memory system and also

More information

Recap: Machine Organization

Recap: Machine Organization ECE232: Hardware Organization and Design Part 14: Hierarchy Chapter 5 (4 th edition), 7 (3 rd edition) http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy,

More information

The Memory Hierarchy & Cache

The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

ECE ECE4680

ECE ECE4680 ECE468. -4-7 The otivation for s System ECE468 Computer Organization and Architecture DRA Hierarchy System otivation Large memories (DRA) are slow Small memories (SRA) are fast ake the average access time

More information

CENG4480 Lecture 09: Memory 1

CENG4480 Lecture 09: Memory 1 CENG4480 Lecture 09: Memory 1 Bei Yu byu@cse.cuhk.edu.hk (Latest update: November 8, 2017) Fall 2017 1 / 37 Overview Introduction Memory Principle Random Access Memory (RAM) Non-Volatile Memory Conclusion

More information

Topic 21: Memory Technology

Topic 21: Memory Technology Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,

More information

Topic 21: Memory Technology

Topic 21: Memory Technology Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006

More information

Lecture 10: Memory System -- Memory Technology CSE 564 Computer Architecture Summer 2017

Lecture 10: Memory System -- Memory Technology CSE 564 Computer Architecture Summer 2017 Lecture 10: Memory System -- Memory Technology CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan 1 Topics for

More information

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds. Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum

More information

Memory Technologies. Technology Trends

Memory Technologies. Technology Trends . 5 Technologies Random access technologies Random good access time same for all locations DRAM Dynamic Random Access High density, low power, cheap, but slow Dynamic need to be refreshed regularly SRAM

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap

More information

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.

The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. The Hierarchical Memory System The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory Hierarchy:

More information

ECE 485/585 Microprocessor System Design

ECE 485/585 Microprocessor System Design Microprocessor System Design Lecture 5: Zeshan Chishti DRAM Basics DRAM Evolution SDRAM-based Memory Systems Electrical and Computer Engineering Dept. Maseeh College of Engineering and Computer Science

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

Memory Hierarchy: Motivation

Memory Hierarchy: Motivation Memory Hierarchy: Motivation The gap between CPU performance and main memory speed has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory

More information

Course Administration

Course Administration Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570

More information

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff Lecture 20: ory Hierarchy Main ory and Enhancing its Performance Professor Alvin R. Lebeck Computer Science 220 Fall 1999 HW #4 Due November 12 Projects Finish reading Chapter 5 Grinch-Like Stuff CPS 220

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved

More information

Memory Hierarchy: The motivation

Memory Hierarchy: The motivation Memory Hierarchy: The motivation The gap between CPU performance and main memory has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory

More information

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several

More information

Computer System Components

Computer System Components Computer System Components CPU Core 1 GHz - 3.2 GHz 4-way Superscaler RISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head

More information

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

CS152 Computer Architecture and Engineering Lecture 17: Cache System

CS152 Computer Architecture and Engineering Lecture 17: Cache System CS152 Computer Architecture and Engineering Lecture 17 System March 17, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http//http.cs.berkeley.edu/~patterson

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance

More information

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1

Memory Hierarchy. Maurizio Palesi. Maurizio Palesi 1 Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

Memory hierarchy Outline

Memory hierarchy Outline Memory hierarchy Outline Performance impact Principles of memory hierarchy Memory technology and basics 2 Page 1 Performance impact Memory references of a program typically determine the ultimate performance

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (I)

COSC 6385 Computer Architecture. - Memory Hierarchies (I) COSC 6385 Computer Architecture - Hierarchies (I) Fall 2007 Slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05 Recap

More information

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast

More information

The Memory Hierarchy Part I

The Memory Hierarchy Part I Chapter 6 The Memory Hierarchy Part I The slides of Part I are taken in large part from V. Heuring & H. Jordan, Computer Systems esign and Architecture 1997. 1 Outline: Memory components: RAM memory cells

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COEN-4710 Computer Hardware Lecture 7 Large and Fast: Exploiting Memory Hierarchy (Chapter 5) Cristinel Ababei Marquette University Department

More information

Review: Tomasulo Organization. CS152 Computer Architecture and Engineering Lecture 17. Finish speculation Locality and Memory Technology

Review: Tomasulo Organization. CS152 Computer Architecture and Engineering Lecture 17. Finish speculation Locality and Memory Technology Review: Tomasulo Organization CS152 Computer Architecture and Engineering Lecture 17 Finish speculation Locality and Technology )URP0HP )32S 4XHXH /RDG%XIIHUV /RDG /RDG /RDG /RDG /RDG /RDG )35HJLVWHUV

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

CPE 631 Lecture 04: CPU Caches

CPE 631 Lecture 04: CPU Caches Lecture 04 CPU Caches Electrical and Computer Engineering University of Alabama in Huntsville Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 26/01/2004 UAH- 2 1 Processor-DR

More information

Lecture-14 (Memory Hierarchy) CS422-Spring

Lecture-14 (Memory Hierarchy) CS422-Spring Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568/668 Part 11 Memory Hierarchy - I Israel Koren ECE568/Koren Part.11.1 ECE568/Koren Part.11.2 Ideal Memory

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:

Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site: Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory 1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: Reducing Miss Penalty Summary Five techniques Read priority

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-2800 (DDR3-600) 200 MHz (internal base chip clock) 8-way interleaved

More information

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Caches and Memory Hierarchies.

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Caches and Memory Hierarchies. Introduction to Computer Architecture Caches and emory Hierarchies Copyright 2012 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) and Alvin Lebeck (Duke) Spring 2012 Where

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality Administrative EEC 7 Computer Architecture Fall 5 Improving Cache Performance Problem #6 is posted Last set of homework You should be able to answer each of them in -5 min Quiz on Wednesday (/7) Chapter

More information

Memory Hierarchy and Caches

Memory Hierarchy and Caches Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Lecture 17 Introduction to Memory Hierarchies Why it s important  Fundamental lesson(s) Suggested reading: (HP Chapter Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "

More information

Question?! Processor comparison!

Question?! Processor comparison! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 5.1-5.2!! (Over the next 2 lectures)! Lecture 18" Introduction to Memory Hierarchies! 3! Processor components! Multicore processors and programming! Question?!

More information

Speicherarchitektur. Who Cares About the Memory Hierarchy? Technologie-Trends. Speicher-Hierarchie. Referenz-Lokalität. Caches

Speicherarchitektur. Who Cares About the Memory Hierarchy? Technologie-Trends. Speicher-Hierarchie. Referenz-Lokalität. Caches 11 Speicherarchitektur Speicher-Hierarchie Referenz-Lokalität Caches 1 Technologie-Trends Kapazitäts- Geschwindigkeitssteigerung Logik 2 fach in 3 Jahren 2 fach in 3 Jahren DRAM 4 fach in 3 Jahren 1.4

More information