Who Cares About the Memory Hierarchy? Time CS 152 Lec16.6. Performance. Recap
|
|
- Florence Rice
- 5 years ago
- Views:
Transcription
1 CS 152 Lec16.1 Recap CS152: Computer rchitecture and Engineering Lecture 16: Locality and Technology MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load) More performance from deeper pipelines, parallelism Increasing length of pipe increases impact of hazards; pipelining helps instruction bandwidth, not latency SW Pipelining Symbolic Loop Unrolling to get most from pipeline with little code expansion, little overhead ynamic Branch Prediction + early branch address for speculative execution Superscalar and VLIW CPI < 1 CS 152 Lec16.2 ynamic issue vs. Static issue More instructions issue at same time, larger the penalty of hazards Intel EPIC in I-64 a hybrid: compact LIW + data hazard check The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control atapath Today s Topics: Locality and Hierarchy Technology RM Technology Organization Input Output Technology Trends (from 1st lecture) Capacity Speed (latency) Logic: 2x in 3 years 2x in 3 years RM: 4x in 3 years 2x in 10 years isk: 4x in 3 years 2x in 10 years RM Year Size Cycle 1000:1! 2:1! Kb 250 ns Kb 220 ns Mb 190 ns Mb 165 ns Mb 145 ns Mb 120 ns CS 152 Lec16.3 CS 152 Lec16.4 Performance Who Cares bout the Hierarchy? 1000 Processor-RM Gap (latency) µproc CPU 60%/yr. Moore s Law (2X/1.5yr) Processor- Performance Gap: (grows 50% / year) RM RM 9%/yr. (2X/10 yrs) Today s Situation: Microprocessor Rely on caches to bridge gap Microprocessor-RM performance gap time of a full cache miss in instructions executed 1st lpha (7000): 340 ns/5.0 ns = 68 clks x 2 or 136 instructions 2nd lpha (8400): 266 ns/3.3 ns = 80 clks x 4 or 320 instructions 3rd lpha (t.b.d.): 180 ns/1.7 ns =108 clks x 6 or 648 instructions 1/2X latency x 3X clock rate x 3X Instr/clock ~5X CS 152 Lec16.5 CS 152 Lec16.6
2 CS 152 Lec16.7 Impact on Performance Suppose a processor executes at Clock Rate = 200 MHz (5 ns per cycle) CPI = % arith/logic, 30% ld/st, 20% control Suppose that 10% of memory operations get 50 cycle miss penalty atamiss (1.6) 49% CPI = ideal CPI + average stalls per instruction = 1.1(cyc) +( 0.30 (datamops/ins) x 0.10 (miss/datamop) x 50 (cycle/miss) ) = 1.1 cycle cycle = % of the time the processor is stalled waiting for memory! Inst Miss (0.5) 16% a 1% instruction miss rate would add an additional 0.5 cycles to the CPI! Ideal CPI (1.1) 35% CS 152 Lec16.8 The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and fast (most of the time)? Hierarchy Parallelism n Expanded View of the System Why hierarchy works Processor Control The Principle of Locality: Program access a relatively small portion of the address space at any instant of time. Probability of reference atapath 0 ddress Space 2^n - 1 Speed: Size: Cost: Fastest Smallest Highest Slowest Biggest Lowest CS 152 Lec16.9 CS 152 Lec16.10 Hierarchy: How oes it Work? Hierarchy: Terminology Temporal Locality (Locality in ): => Keep most recently accessed data items closer to the processor Spatial Locality (Locality in Space): => Move blocks which consist of contiguous words to the upper levels To Processor From Processor Upper Level Blk X Lower Level Blk Y Hit: data appears in some block in the upper level (example: Block X) Hit Rate: the fraction of memory access found in the upper level Hit : to access the upper level which consists of RM access time + to determine hit/miss Miss: data needs to be retrieved from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty: to replace a block in the upper level + to deliver the block the processor Hit << Miss Penalty To Processor From Processor Upper Level Blk X Lower Level Blk Y CS 152 Lec16.11 CS 152 Lec16.12
3 CS 152 Lec16.13 Hierarchy of a Modern Computer System By taking advantage of the principle of locality: Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. atapath Processor Control Registers On-Chip Cache Second Level Cache () Main (RM) Secondary Storage (isk) Tertiary Storage (isk) How is the hierarchy managed? Registers <-> by compiler (programmer?) cache <-> memory by the hardware memory <-> disks by the hardware and operating system (virtual memory) by the programmer (files) Speed (ns): 1s 10s 100s Size (bytes): 100s Ks Ms 10,000,000s (10s ms) Gs 10,000,000,000s (10s sec) Ts CS 152 Lec16.14 Hierarchy Technology Random ccess: Random is good: access time is the same for all locations RM: ynamic Random ccess - High density, low power, cheap, slow - ynamic: need to be refreshed regularly : Static Random ccess - Low density, high power, expensive, fast - Static: content will last forever (until lose power) Non-so-random ccess Technology: ccess time varies from location to location and from time to time Examples: isk, CROM Sequential ccess Technology: access time linear in location (e.g.,tape) The next two lectures will concentrate on random access technology The Main : RMs + Caches: s Main Background Performance of Main : Latency: Cache Miss Penalty - ccess : time between request and word arrives - Cycle : time between requests Bandwidth: & Large Block Miss Penalty (L2) Main is RM: ynamic Random ccess ynamic since needs to be refreshed periodically (8 ms) ddresses divided into 2 halves ( as a 2 matrix): - RS or Row ccess Strobe - CS or Column ccess Strobe Cache uses : Static Random ccess No refresh (6 transistors/ vs. 1 transistor) Size: RM/ 4-8, Cost/Cycle time: /RM 8-16 CS 152 Lec16.15 CS 152 Lec16.16 Random ccess (RM) Technology Static RM Why do computer designers need to know about RM technology? Processor performance is usually limited by memory bandwidth s IC densities increase, lots of memory will fit on processor chip - Tailor on-chip memory to specific needs - Instruction cache - ata cache - Write buffer What makes RM different from a bunch of flip-flops? ensity: RM is much more denser 6-Transistor Write: 1. rive lines (=1, =0) 2.. Select row word (row select) Read: 1. Precharge and to Vdd 2.. Select row 3. pulls one line low 1 word replaced with pullup to save area 4. Sense amp on column detects difference between and CS 152 Lec16.17 CS 152 Lec16.18
4 CS 152 Lec16.19 Typical Organization: 16-word x 4- in 3 Wr river & Wr river & Wr river & Wr river & - Precharger+ - Precharger+ - Precharger+ - Precharger+ : : : : - Sense mp + - Sense mp + - Sense mp + - Sense mp + out 3 in 2 out 2 in 1 out 1 in 0 out 0 Precharge Word 0 Word 1 Word 15 ddress ecoder WrEn Logic iagram of a Typical Write Enable is usually active low () in and out are combined to save pins: new control signal, output enable () is needed is asserted (Low), is disasserted (High) - serves as the data input pin is disasserted (High), is asserted (Low) - is the data output pin Both and are asserted: - Result is unknown. on t do that!!! lthough could change VHL to do whatever you desire, must do the best with what you ve got (vs. what you need) CS 152 Lec16.20 N 2 N words x M M Typical Timing Problems with Write Timing: ata In Write ddress N Write Setup High Z 2 N words x M Write Hold M Read Timing: Junk Read ddress Read ccess ata Out Read ddress Read ccess ata Out On Select = 1 Off On On Off N1 N2 = 1 = 0 Six transistors use up a lot of area Consider a Zero is stored in the cell: Transistor N1 will try to pull to 0 Transistor P2 will try to pull bar to 1 P1 But lines are precharged high: re P1 and P2 necessary? P2 On CS 152 Lec16.21 CS 152 Lec Transistor (RM) Classical RM Organization (square) (data) lines Write: 1. rive line 2.. Select row Read: 1. Precharge line to Vdd 2.. Select row 3. and line share charges - Very small voltage changes on the line 4. Sense (fancy sense amp) - Can detect changes of ~10-100k electrons 5. Write: restore the value row select r o w d e c o d e r row address RM rray Column Selector & Circuits Each intersection represents a 1-T RM word (row) select Column ddress Refresh 1. Just do a dummy read to every cell. CS 152 Lec16.23 CS 152 Lec16.24 data Row and Column ddress together: Select 1 a time
5 CS 152 Lec16.25 RM logical organization (4 M) RM physical organization (4 M) Column ecoder 11 Sense mps & Column ddress 8 s 0 10 rray (2,048 x 2,048) Q Row ddress Block Row ec. 9 : 512 Block Row ec. 9 : 512 Block Row ec. 9 : 512 Block Row ec. 9 : Q Square root of s per RS/CS Word Line Storage Block 0 Block 3 8 s CS 152 Lec16.26 Systems Logic iagram of a Typical RM RS_L CS_L address n RM Controller Timing Controller Tc = Tcycle + Tcontroller + Tdriver RM 2^n x 1 chip w Bus rivers 256K x 8 9 RM 8 Control Signals (RS_L, CS_L,, ) are all active low in and out are combined (): is asserted (Low), is disasserted (High) - serves as the data input pin is disasserted (High), is asserted (Low) - is the data output pin Row and column addresses share the same pins () RS_L goes low: Pins are latched in as row address CS_L goes low: Pins are latched in as column address RS/CS edge-sensitive CS 152 Lec16.27 CS 152 Lec16.28 Key RM Timing Parameters t RC : minimum time from RS line falling to the valid data output. Quoted as the speed of a RM fast 4Mb RM t RC = 60 ns t RC : minimum time from the start of one row access to the start of the next. t RC = 110 ns for a 4M RM with a t RC of 60 ns t CC : minimum time from CS line falling to valid data output. 15 ns for a 4M RM with a t RC of 60 ns t PC : minimum time from the start of one column access to the start of the next. 35 ns for a 4M RM with a t RC of 60 ns RM Performance 60 ns (t RC ) RM can perform a row access only every 110 ns (t RC ) perform column access (t CC ) in 15 ns, but time between column accesses is at least 35 ns (t PC ). - In practice, external address delays and turning around buses make it 40 to 50 ns These times do not include the time to drive the addresses off the microprocessor nor the memory controller overhead. rive parallel RMs, external memory controller, bus to turn around, SIMM module, pins 180 ns to 250 ns latency from processor to memory is good for a 60 ns (t RC ) RM CS 152 Lec16.29 CS 152 Lec16.30
6 CS 152 Lec16.31 RM Write Timing RM Read Timing Every RM access begins at: The assertion of the RS_L 2 ways to write: early or late v. CS RM WR Cycle RS_L CS_L 256K x 8 9 RM 8 Every RM access begins at: The assertion of the RS_L 2 ways to read: early or late v. CS RM Read Cycle RS_L CS_L 256K x 8 9 RM 8 RS_L RS_L CS_L CS_L Row ddress Col ddress Junk Row ddress Col ddress Junk Row ddress Col ddress Junk Row ddress Col ddress Junk Junk ata In Junk ata In Junk WR ccess Early Wr Cycle: asserted before CS_L WR ccess Late Wr Cycle: asserted after CS_L High Z Junk ata Out High Z ata Out CS 152 Lec16.32 Read ccess Early Read Cycle: asserted before CS_L Output Enable elay Late Read Cycle: asserted after CS_L Main Performance Cycle versus ccess Cycle Simple: CPU, Cache, Bus, same width (32 s) Wide: CPU/Mux 1 word; Mux/Cache, Bus, N words (lpha: 64 s & 256 s) Interleaved: CPU, Cache, Bus 1 word: N Modules (4 Modules); example is word interleaved ccess RM (Read/Write) Cycle >> RM (Read/Write) ccess 2:1; why? RM (Read/Write) Cycle : How frequently can you initiate an access? RM (Read/Write) ccess : How quickly will you get what you want once you initiate an access? RM Bandwidth Limitation : How much data can you get from the memory? CS 152 Lec16.33 CS 152 Lec16.34 Increasing Bandwidth - Interleaving Main Performance ccess Pattern without Interleaving: 1 available Start ccess for 1 ccess Pattern with 4-way Interleaving: ccess Bank 0 CS 152 Lec16.35 Start ccess for 2 ccess Bank 1 ccess Bank 2 ccess Bank 3 We can ccess Bank 0 again CPU CPU Bank 0 Bank 1 Bank 2 Bank 3 CS 152 Lec16.36 Timing to access one 4 word cache block 1 to send address, 6 access time, 1 to send data Cache Block is 4 words Simple M.P. = 4 x (1+6+1) = 32 Wide M.P. = = 8 Interleaved M.P. = x1 = 11
7 CS 152 Lec16.37 Independent Banks How many banks? number banks = number clocks to access word in bank For sequential accesses, otherwise will return to original bank before it has next word ready Increasing RM => fewer chips => harder to have banks Growth s/chip RM : 50%-60%/yr Nathan Myrvold M/S: mature software growth (33%/yr for NT) growth MB/$ of RM (25%-30%/yr) (from Pete MacWilliams, Intel) Minimum PC Size CS 152 Lec16.38 Fewer RMs/System over 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB RM Generation Mb 4 Mb 16 Mb 64 Mb 256 Mb 1 Gb 32 8 per 16 4 RM 60% / year 8 2 per System 25%-30% / year Page Mode RM: Motivation Fast Page Mode Operation Regular RM Organization: N rows x N column x M- Read & Write M- at a time Each M- access requires a RS / CS cycle Fast Page Mode RM N x M register to save a row Column ddress N rows N cols RM M- Output M s Row ddress Fast Page Mode RM N x M to save a row fter a row is read into the register Only CS is needed to access other M- blocks on that row RS_L remains asserted while CS_L is toggled Column ddress N rows M- Output N cols RM N x M M s Row ddress RS_L 1st M- ccess 2nd M- ccess RS_L 1st M- ccess 2nd M- 3rd M- 4th M- CS_L CS_L Row ddress Col ddress Junk Row ddress Col ddress Junk Row ddress Col ddress Col ddress Col ddress Col ddress CS 152 Lec16.39 CS 152 Lec16.40 RM v. esktop Microprocessor Cultures RM esign Goals Standards pinout, package, refresh rate, binary compatibility, IEEE 754, bus capacity,... Sources Multiple Single Figures 1) capacity, 2) $/ 1) SPEC speed of Merit 3) BW, 4) latency 2) cost Improve 1) 60%, 2) 25%, 1) 60%, Rate/year 3) 20%, 4) 7% 2) little change Reduce cell size 2.5, increase die size 1.5 Sell 10% of a single RM generation 6.25 billion RMs sold in phases: engineering samples, first customer ship(fcs), mass production Fastest to FCS, mass production wins share ie size, testing time, yield => profit Yield >> 60% (redundant rows/columns to repair flaws) CS 152 Lec16.41 CS 152 Lec16.42
8 CS 152 Lec16.43 RM History Summary: RMs: capacity +60%/yr, cost 30%/yr 2.5X cells/area, 1.5X die size in 3 years 97 RM fab line costs $1B to $2B RM only: density, leakage v. speed Rely on increasing no. of computers & memory per computer (60% market) SIMM or IMM is replaceable unit => computers use any generation RM Commodity, second source industry => high volume, low profit, conservative Little organization innovation in 20 years page mode, EO, Synch RM Order of importance: 1) Cost/ 2) Capacity RMBUS: 10X BW, cost increase?? Two ifferent Types of Locality: Temporal Locality (Locality in ): If an item is referenced, it will tend to be referenced again soon. CS 152 Lec16.44 Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon. By taking advantage of the principle of locality: Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. RM is slow but cheap and dense: Good choice for presenting the user with a BIG memory system is fast but expensive and not very dense: Good choice for providing the user FST access time. Summary: Processor- Performance Gap Tax Processor % rea %Transistors (~cost) (~power) lpha % 77% Strongrm S110 61% 94% Pentium Pro 64% 88% 2 dies per package: Proc/I$/$ + L2$ Caches have no inherent value, only try to close performance gap CS 152 Lec16.45
EEC 483 Computer Organization
EEC 483 Computer Organization Chapter 5 Large and Fast: Exploiting Memory Hierarchy Chansu Yu Table of Contents Ch.1 Introduction Ch. 2 Instruction: Machine Language Ch. 3-4 CPU Implementation Ch. 5 Cache
More informationCpE 442. Memory System
CpE 442 Memory System CPE 442 memory.1 Outline of Today s Lecture Recap and Introduction (5 minutes) Memory System: the BIG Picture? (15 minutes) Memory Technology: SRAM and Register File (25 minutes)
More informationMemory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer
The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Datapath Today s Topics: technologies Technology trends Impact on performance Hierarchy The principle of locality
More informationCS152 Computer Architecture and Engineering Lecture 16: Memory System
CS152 Computer Architecture and Engineering Lecture 16: System March 15, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http://http.cs.berkeley.edu/~patterson
More informationECE468 Computer Organization and Architecture. Memory Hierarchy
ECE468 Computer Organization and Architecture Hierarchy ECE468 memory.1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic:
More informationCPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?
cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example
More informationECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]
ECE7995 (4) Basics of Memory Hierarchy [Adapted from Mary Jane Irwin s slides (PSU)] Major Components of a Computer Processor Devices Control Memory Input Datapath Output Performance Processor-Memory Performance
More informationMemories: Memory Technology
Memories: Memory Technology Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Memory Hierarchy
More informationCSC Memory System. A. A Hierarchy and Driving Forces
CSC1016 1. System A. A Hierarchy and Driving Forces 1A_1 The Big Picture: The Five Classic Components of a Computer Processor Input Control Datapath Output Topics: Motivation for Hierarchy View of Hierarchy
More informationCENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System
More informationCENG3420 Lecture 08: Memory Organization
CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48 Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Ideally, computer memory would be large and fast
More information,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics
,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics The objectives of this module are to discuss about the need for a hierarchical memory system and also
More informationRecap: Machine Organization
ECE232: Hardware Organization and Design Part 14: Hierarchy Chapter 5 (4 th edition), 7 (3 rd edition) http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy,
More informationThe Memory Hierarchy & Cache
Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationECE ECE4680
ECE468. -4-7 The otivation for s System ECE468 Computer Organization and Architecture DRA Hierarchy System otivation Large memories (DRA) are slow Small memories (SRA) are fast ake the average access time
More informationCENG4480 Lecture 09: Memory 1
CENG4480 Lecture 09: Memory 1 Bei Yu byu@cse.cuhk.edu.hk (Latest update: November 8, 2017) Fall 2017 1 / 37 Overview Introduction Memory Principle Random Access Memory (RAM) Non-Volatile Memory Conclusion
More informationTopic 21: Memory Technology
Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,
More informationTopic 21: Memory Technology
Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,
More information14:332:331. Week 13 Basics of Cache
14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006
More informationLecture 10: Memory System -- Memory Technology CSE 564 Computer Architecture Summer 2017
Lecture 10: Memory System -- Memory Technology CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan 1 Topics for
More informationReview: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.
Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum
More informationMemory Technologies. Technology Trends
. 5 Technologies Random access technologies Random good access time same for all locations DRAM Dynamic Random Access High density, low power, cheap, but slow Dynamic need to be refreshed regularly SRAM
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 6A: Cache Design Avinash Kodi, kodi@ohioedu Agenda 2 Review: Memory Hierarchy Review: Cache Organization Direct-mapped Set- Associative Fully-Associative 1 Major
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationMain Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be
More informationModern Computer Architecture
Modern Computer Architecture Lecture3 Review of Memory Hierarchy Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Performance 1000 Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap
More informationCENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationThe Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.
The Hierarchical Memory System The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory Hierarchy:
More informationECE 485/585 Microprocessor System Design
Microprocessor System Design Lecture 5: Zeshan Chishti DRAM Basics DRAM Evolution SDRAM-based Memory Systems Electrical and Computer Engineering Dept. Maseeh College of Engineering and Computer Science
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationLECTURE 10: Improving Memory Access: Direct and Spatial caches
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More informationMemory Hierarchy: Motivation
Memory Hierarchy: Motivation The gap between CPU performance and main memory speed has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationLecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff
Lecture 20: ory Hierarchy Main ory and Enhancing its Performance Professor Alvin R. Lebeck Computer Science 220 Fall 1999 HW #4 Due November 12 Projects Finish reading Chapter 5 Grinch-Like Stuff CPS 220
More informationMainstream Computer System Components
Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved
More informationMemory Hierarchy: The motivation
Memory Hierarchy: The motivation The gap between CPU performance and main memory has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory
More informationPipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010
Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several
More informationComputer System Components
Computer System Components CPU Core 1 GHz - 3.2 GHz 4-way Superscaler RISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware
More information14:332:331. Week 13 Basics of Cache
14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head
More informationMainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation
Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer
More informationThe Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):
The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationCS152 Computer Architecture and Engineering Lecture 17: Cache System
CS152 Computer Architecture and Engineering Lecture 17 System March 17, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http//http.cs.berkeley.edu/~patterson
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance
More informationCS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches
CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationMemory hierarchy Outline
Memory hierarchy Outline Performance impact Principles of memory hierarchy Memory technology and basics 2 Page 1 Performance impact Memory references of a program typically determine the ultimate performance
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (I)
COSC 6385 Computer Architecture - Hierarchies (I) Fall 2007 Slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05 Recap
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationThe Memory Hierarchy Part I
Chapter 6 The Memory Hierarchy Part I The slides of Part I are taken in large part from V. Heuring & H. Jordan, Computer Systems esign and Architecture 1997. 1 Outline: Memory components: RAM memory cells
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationMemory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky
Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COEN-4710 Computer Hardware Lecture 7 Large and Fast: Exploiting Memory Hierarchy (Chapter 5) Cristinel Ababei Marquette University Department
More informationReview: Tomasulo Organization. CS152 Computer Architecture and Engineering Lecture 17. Finish speculation Locality and Memory Technology
Review: Tomasulo Organization CS152 Computer Architecture and Engineering Lecture 17 Finish speculation Locality and Technology )URP0HP )32S 4XHXH /RDG%XIIHUV /RDG /RDG /RDG /RDG /RDG /RDG )35HJLVWHUV
More informationEEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?
EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology
More informationCPE 631 Lecture 04: CPU Caches
Lecture 04 CPU Caches Electrical and Computer Engineering University of Alabama in Huntsville Outline Memory Hierarchy Four Questions for Memory Hierarchy Cache Performance 26/01/2004 UAH- 2 1 Processor-DR
More informationLecture-14 (Memory Hierarchy) CS422-Spring
Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568/668 Part 11 Memory Hierarchy - I Israel Koren ECE568/Koren Part.11.1 ECE568/Koren Part.11.2 Ideal Memory
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Processor-Memory Performance Gap 10000 µproc 55%/year (2X/1.5yr) Performance 1000 100 10 1 1980 1983 1986 1989 Moore s Law Processor-Memory Performance
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationTextbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, Textbook web site:
Textbook: Burdea and Coiffet, Virtual Reality Technology, 2 nd Edition, Wiley, 2003 Textbook web site: www.vrtechnology.org 1 Textbook web site: www.vrtechnology.org Laboratory Hardware 2 Topics 14:332:331
More informationChapter Seven. Large & Fast: Exploring Memory Hierarchy
Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationLRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.
LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E
More informationCOEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory
1 COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationLecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 18: Memory Hierarchy Main Memory and Enhancing its Performance Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: Reducing Miss Penalty Summary Five techniques Read priority
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationChapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review
Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationMainstream Computer System Components
Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-2800 (DDR3-600) 200 MHz (internal base chip clock) 8-way interleaved
More informationWhere We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Caches and Memory Hierarchies.
Introduction to Computer Architecture Caches and emory Hierarchies Copyright 2012 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) and Alvin Lebeck (Duke) Spring 2012 Where
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationEEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality
Administrative EEC 7 Computer Architecture Fall 5 Improving Cache Performance Problem #6 is posted Last set of homework You should be able to answer each of them in -5 min Quiz on Wednesday (/7) Chapter
More informationMemory Hierarchy and Caches
Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline
More informationEITF20: Computer Architecture Part4.1.1: Cache - 2
EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationCS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS
CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationMemory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple
Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss
More informationLocality. Cache. Direct Mapped Cache. Direct Mapped Cache
Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be
More informationLecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter
Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "
More informationQuestion?! Processor comparison!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 5.1-5.2!! (Over the next 2 lectures)! Lecture 18" Introduction to Memory Hierarchies! 3! Processor components! Multicore processors and programming! Question?!
More informationSpeicherarchitektur. Who Cares About the Memory Hierarchy? Technologie-Trends. Speicher-Hierarchie. Referenz-Lokalität. Caches
11 Speicherarchitektur Speicher-Hierarchie Referenz-Lokalität Caches 1 Technologie-Trends Kapazitäts- Geschwindigkeitssteigerung Logik 2 fach in 3 Jahren 2 fach in 3 Jahren DRAM 4 fach in 3 Jahren 1.4
More information