CS 152 Compuer Archiecure and Engineering Lecure 6 - Memory Krse Asanovic Elecrical Engineering and Compuer Sciences Universiy of California a Berkeley hp://www.eecs.berkeley.edu/~krse hp://ins.eecs.berkeley.edu/~cs152 Las ime in Lecure 5 Conrol hazards (branches, inerrups) are mos difficul o handle as hey change which insrucion should be execued nex Speculaion commonly used o reduce effec of conrol hazards (predic sequenial fech, predic no excepions) Branch delay slos make conrol hazard visible o sofware Precise excepions: sop cleanly on one insrucion, all previous insrucions compleed, no following insrucions have changed archiecural sae To implemen precise excepions in pipeline, shif fauling insrucions down pipeline o commi poin, where excepions are handled in program order 2
CPU-Memory Boleneck CPU Memory Performance of high-speed compuers is usually limied by memory bandwidh & laency Laency (ime for a single access) Memory access ime >> Processor cycle ime Bandwidh (number of accesses per uni ime) if fracion m of insrucions access memory,!1+m memory references / insrucion!cpi = 1 requires 1+m memory refs / cycle (assuming MIPS RISC ISA) 3 Core Memory Core memory was firs large scale reliable main memory invened by Forreser in lae 40s/early 50s a MIT for Whirlwind projec Bis sored as magneizaion polariy on small ferrie cores hreaded ono 2 dimensional grid of wires Coinciden curren pulses on X and Y wires would wrie cell and also sense original sae (desrucive reads) Robus, non-volaile sorage Used on space shule compuers unil recenly Cores hreaded ono wires by hand (25 billion a year a peak producion) Core access ime ~ 1µs DEC PDP-8/E Board, 4K words x 12 bis, (1968) 4
Semiconducor Memory, DRAM Semiconducor memory began o be compeiive in early 1970s Inel formed o exploi marke for semiconducor memory Firs commercial DRAM was Inel 1103 1Kbi of sorage on single chip charge on a capacior used o hold value Semiconducor memory quickly replaced core in 70s 5 One Transisor Dynamic RAM 1-T DRAM Cell word access ransisor V REF TiN op elecrode (V REF ) Ta 2 O 5 dielecric bi Sorage capacior (FET gae, rench, sack) poly word line W boom elecrode access ransisor 6
DRAM Archiecure Col. 1 bi lines Col. N+M N M Row Address Decoder Column Decoder & Sense Amplifiers 2 M Row 1 word lines Row 2 N Memory cell (one bi) D Bis sored in 2-dimensional arrays on chip Modern chips have around 4 logical banks on each chip each logical bank physically implemened as many smaller arrays 7 DRAM Packaging ~7 Clock and conrol signals Address lines muliplexed row/column address ~12 bus (4b,8b,16b,32b) DRAM chip DIMM (Dual Inline Memory Module) conains muliple chips wih clock/conrol/address signals conneced in parallel (someimes need buffers o drive signals o all chips) pins work ogeher o reurn wide word (e.g., 64-bi daa bus using 16x4-bi pars) 8
DRAM Operaion Three seps in read/wrie access o a given bank Row access (RAS) decode row address, enable addressed row (ofen muliple Kb in row) bilines share charge wih sorage cell small change in volage deeced by sense amplifiers which lach whole row of bis sense amplifiers drive bilines full rail o recharge sorage cells Column access (CAS) decode column address o selec small number of sense amplifier laches (4, 8, 16, or 32 bis depending on DRAM package) on read, send lached bis ou o chip pins on wrie, change sense amplifier laches which hen charge sorage cells o required value can perform muliple column accesses on same row wihou anoher row access (burs mode) Precharge charges bi lines o known value, required before nex row access Each sep has a laency of around 15-20ns in modern DRAMs Various DRAM sandards (DDR, RDRAM) have differen ways of encoding he signals for ransmission o he DRAM, bu all share same core archiecure 9 200MHz Clock Double- Rae (DDR2) DRAM Row Column Precharge Row [ Micron, 256Mb DDR2 SDRAM daashee ] 400Mb/s Rae 10
Performance Processor-DRAM Gap (laency) 1000 100 10 1 1980 1981 1982 1983 1984 1985 1986 1987!Proc 60%/year Moore s Law 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time DRAM CPU Processor-Memory Performance Gap: (grows 50% / year) Four-issue 2GHz superscalar accessing 100ns DRAM could execue 800 insrucions during ime for one memory access! DRAM 7%/year 11 Typical Memory Reference Paerns Insrucion feches Address Sack accesses accesses Time
Common Predicable Paerns Two predicable properies of memory references: Temporal Localiy: If a locaion is referenced i is likely o be referenced again in he near fuure. Spaial Localiy: If a locaion is referenced i is likely ha locaions near i will be referenced in he near fuure. Memory Reference Paerns Memory Address (one do per access) Spaial Localiy Temporal Localiy Time Donald J. Hafield, Jeanee Gerald: Program Resrucuring for Virual Memory. IBM Sysems Journal 10(3): 168-192 (1971)
Mulilevel Memory Sraegy: Reduce average laency using small, fas memories called caches. Caches are a mechanism o reduce memory laency based on he empirical observaion ha he paerns of memory references made by a processor are ofen highly predicable: PC 96 loop: ADD r2, r1, r1 100 SUBI r3, r3, #1 104 BNEZ r3, loop 108 112 Memory Hierarchy CPU A Small, Fas Memory (RF, SRAM) B holds frequenly used daa Big, Slow Memory (DRAM) capaciy: Regiser << SRAM << DRAM why? laency: Regiser << SRAM << DRAM why? bandwidh: on-chip >> off-chip why? On a daa access: hi (daa " fas memory)! low laency access miss (daa # fas memory)! long laency access (DRAM) 16
Relaive Memory Cell Sizes On-Chip SRAM in logic chip DRAM on memory chip [ Foss, Implemening Applicaion-Specific Memory, ISSCC 1996 ] 17 Managemen of Memory Hierarchy Small/fas sorage, e.g., regisers Address usually specified in insrucion Generally implemened direcly as a regiser file» bu hardware migh do hings behind sofware s back, e.g., sack managemen, regiser renaming Large/slower sorage, e.g., memory Address usually compued from values in regiser Generally implemened as a cache hierarchy» hardware decides wha is kep in fas memory» bu sofware may provide hins, e.g., don cache or prefech 18
CS152 Adminisrivia Quiz 1 Thursday in class (306 Soda) Lecures 1-5, closed book, no calculaors or compuers Krse, special office hours, Wednesday 2/11, 2-3pm, 579 Soda Hall (Par Lab) Sco special office hours, Wednesday 2/11, 4-5pm, 711 Soda Hall Nex week lecure 2/17 back in 320 Soda 19 Caches Caches exploi boh ypes of predicabiliy: Exploi emporal localiy by remembering he conens of recenly accessed locaions. Exploi spaial localiy by feching blocks of daa around recenly accessed locaions.
Inside a Cache Address Address Processor CACHE Main Memory copy of main memory locaion 100 copy of main memory locaion 101 100 304 Bye Bye Bye Line Address 6848 416 Block Cache Algorihm (Read) Look a Processor Address, search cache ags o find mach. Then eiher Found in cache a.k.a. HIT No in cache a.k.a. MISS Reurn copy of daa from cache Read block of daa from Main Memory Wai Reurn daa o processor and updae cache Q: Which line do we replace?
Placemen Policy Block Number 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Memory Se Number 0 1 2 3 0 1 2 3 4 5 6 7 Cache block 12 can be placed Fully (2-way) Se Direc Associaive Associaive Mapped anywhere anywhere in only ino se 0 block 4 (12 mod 4) (12 mod 8) 23 Direc-Mapped Cache Index Block Offse V k b Block 2 k lines = HIT Word or Bye
Direc Map Address Selecion higher-order vs. lower-order address bis Index Block Offse k V Block b 2 k lines = HIT Word or Bye 2-Way Se-Associaive Cache Index Block Offse b V k Block V Block = = Word or Bye HIT
Fully Associaive Cache V Block = Block Offse b = = Word or Bye HIT Replacemen Policy In an associaive cache, which block from a se should be eviced when he se becomes full? Random Leas Recenly Used (LRU) LRU cache sae mus be updaed on every access rue implemenaion only feasible for small ses (2-way) pseudo-lru binary ree ofen used for 4-8 way Firs In, Firs Ou (FIFO) a.k.a. Round-Robin used in highly associaive caches No Leas Recenly Used (NLRU) FIFO wih excepion for mos recenly used block or blocks This is a second-order effec. Why? 28
Acknowledgemens These slides conain maerial developed and copyrigh by: Arvind (MIT) Krse Asanovic (MIT/UCB) Joel Emer (Inel/MIT) James Hoe (CMU) John Kubiaowicz (UCB) David Paerson (UCB) MIT maerial derived from course 6.823 UCB maerial derived from course CS252 29