Opportunities and Challenges of Emerging Memory Technologies

Size: px
Start display at page:

Download "Opportunities and Challenges of Emerging Memory Technologies"

Transcription

1 Opportuities ad Challeges of Emergig Memory Techologies Our Mutlu September 11, 2017 ARM Research Summit

2 The Mai Memory System Processors ad caches Mai Memory Storage (SSD/HDD) Mai memory is a critical compoet of all computig systems: server, mobile, embedded, desktop, sesor Mai memory system must scale (i size, techology, efficiecy, cost, ad maagemet algorithms) to maitai performace growth ad techology scalig beefits 2

3 The Mai Memory System FPGAs Mai Memory Storage (SSD/HDD) Mai memory is a critical compoet of all computig systems: server, mobile, embedded, desktop, sesor Mai memory system must scale (i size, techology, efficiecy, cost, ad maagemet algorithms) to maitai performace growth ad techology scalig beefits 3

4 The Mai Memory System GPUs Mai Memory Storage (SSD/HDD) Mai memory is a critical compoet of all computig systems: server, mobile, embedded, desktop, sesor Mai memory system must scale (i size, techology, efficiecy, cost, ad maagemet algorithms) to maitai performace growth ad techology scalig beefits 4

5 Memory System: A Shared Resource View Storage Most of the system is dedicated to storig ad movig data 5

6 State of the Mai Memory System Recet techology, architecture, ad applicatio treds lead to ew reuiremets exacerbate old reuiremets DRAM ad memory cotrollers, as we kow them today, are (will be) ulikely to satisfy all reuiremets Some emergig o-volatile memory techologies (e.g., PCM) eable ew opportuities: memory+storage mergig We eed to rethik the mai memory system to fix DRAM issues ad eable emergig techologies to satisfy all reuiremets 6

7 Major Treds Affectig Mai Memory (I) Need for mai memory capacity, badwidth, QoS icreasig Mai memory eergy/power is a key system desig cocer DRAM techology scalig is edig 7

8 Major Treds Affectig Mai Memory (II) Need for mai memory capacity, badwidth, QoS icreasig Multi-core: icreasig umber of cores/agets Data-itesive applicatios: icreasig demad/huger for data Cosolidatio: cloud computig, GPUs, mobile, heterogeeity Mai memory eergy/power is a key system desig cocer DRAM techology scalig is edig 8

9 Example: The Memory Capacity Gap Core cout doublig ~ every 2 years DRAM DIMM capacity doublig ~ every 3 years Lim et al., ISCA 2009 Memory capacity per core expected to drop by 30% every two years Treds worse for memory badwidth per core! 9

10 Major Treds Affectig Mai Memory (III) Need for mai memory capacity, badwidth, QoS icreasig Mai memory eergy/power is a key system desig cocer ~40-50% eergy spet i off-chip memory hierarchy [Lefurgy, IEEE Computer 03] >40% power i DRAM [Ware, HPCA 10][Paul,ISCA 15] DRAM cosumes power eve whe ot used (periodic refresh) DRAM techology scalig is edig 10

11 Major Treds Affectig Mai Memory (IV) Need for mai memory capacity, badwidth, QoS icreasig Mai memory eergy/power is a key system desig cocer DRAM techology scalig is edig ITRS projects DRAM will ot scale easily below X m Scalig has provided may beefits: higher capacity (desity), lower cost, lower eergy 11

12 Major Treds Affectig Mai Memory (V) DRAM scalig has already become icreasigly difficult Icreasig cell leakage curret, reduced cell reliability, icreasig maufacturig difficulties [Kim+ ISCA 2014], [Liu+ ISCA 2013], [Mutlu IMW 2013], [Mutlu DATE 2017] Difficult to sigificatly improve capacity, eergy Emergig memory techologies are promisig 3D-Stacked DRAM higher badwidth smaller capacity Reduced-Latecy DRAM (e.g., RLDRAM, TL-DRAM) Low-Power DRAM (e.g., LPDDR3, LPDDR4) No-Volatile Memory (NVM) (e.g., PCM, STTRAM, ReRAM, 3D Xpoit) lower latecy lower power larger capacity higher cost higher latecy higher cost higher latecy higher dyamic power lower edurace 12

13 Major Treds Affectig Mai Memory (V) DRAM scalig has already become icreasigly difficult Icreasig cell leakage curret, reduced cell reliability, icreasig maufacturig difficulties [Kim+ ISCA 2014], [Liu+ ISCA 2013], [Mutlu IMW 2013], [Mutlu DATE 2017] Difficult to sigificatly improve capacity, eergy Emergig memory techologies are promisig 3D-Stacked DRAM higher badwidth smaller capacity Reduced-Latecy DRAM (e.g., RLDRAM, TL-DRAM) Low-Power DRAM (e.g., LPDDR3, LPDDR4) No-Volatile Memory (NVM) (e.g., PCM, STTRAM, ReRAM, 3D Xpoit) lower latecy lower power larger capacity higher cost higher latecy higher cost higher latecy higher dyamic power lower edurace 13

14 Ageda Major Treds Affectig Mai Memory The Memory Scalig Problem Two Complemetary Solutio Directios New Memory (DRAM) Architectures Eablig Emergig (NVM) Techologies Coclusio 14

15 Limits of Charge Memory Difficult charge placemet ad cotrol Flash: floatig gate charge DRAM: capacitor charge, trasistor leakage Reliable sesig becomes difficult as charge storage uit size reduces 15

16 The DRAM Scalig Problem DRAM stores charge i a capacitor (charge-based memory) Capacitor must be large eough for reliable sesig Access trasistor should be large eough for low leakage ad high retetio time Scalig beyod 40-35m (2013) is challegig [ITRS, 2009] DRAM capacity, cost, ad eergy/power hard to scale 16

17 As Memory Scales, It Becomes Ureliable Data from all of Facebook s servers worldwide Meza+, Revisitig Memory Errors i Large-Scale Productio Data Ceters, DSN

18 Large-Scale Failure Aalysis of DRAM Chips Aalysis ad modelig of memory errors foud i all of Facebook s server fleet Justi Meza, Qiag Wu, Sajeev Kumar, ad Our Mutlu, "Revisitig Memory Errors i Large-Scale Productio Data Ceters: Aalysis ad Modelig of New Treds from the Field" Proceedigs of the 45th Aual IEEE/IFIP Iteratioal Coferece o Depedable Systems ad Networks (DSN), Rio de Jaeiro, Brazil, Jue [Slides (pptx) (pdf)] [DRAM Error Model] 18

19 Ifrastructures to Uderstad Such Issues Temperature Cotroller FPGAs Heater FPGAs PC Kim+, Flippig Bits i Memory Without Accessig Them: A Experimetal Study of DRAM Disturbace Errors, ISCA

20 SoftMC: Ope Source DRAM Ifrastructure Hasa Hassa et al., SoftMC: A Flexible ad Practical OpeSource Ifrastructure for Eablig Experimetal DRAM Studies, HPCA Flexible Easy to Use (C++ API) Ope-source github.com/cmu-safari/softmc 20

21 SoftMC 21

22 A Curious Discovery [Kim et al., ISCA 2014] Oe ca predictably iduce errors i most DRAM memory chips 22

23 DRAM RowHammer A simple hardware failure mechaism ca create a widespread system security vulerability 23

24 Moder DRAM is Proe to Disturbace Errors Row of Cells Row Victim Row Row Hammered Row Row Victim Row Row Opeed Closed Wordlie V LOW HIGH Repeatedly readig a row eough times (before memory gets refreshed) iduces disturbace errors i adjacet rows i most real DRAM chips you ca buy today Flippig Bits i Memory Without Accessig Them: A Experimetal Study of DRAM Disturbace Errors, (Kim et al., ISCA 2014) 24

25 More o RowHammer Aalysis Yoogu Kim, Ross Daly, Jeremie Kim, Chris Falli, Ji Hye Lee, Doghyuk Lee, Chris Wilkerso, Korad Lai, ad Our Mutlu, "Flippig Bits i Memory Without Accessig Them: A Experimetal Study of DRAM Disturbace Errors" Proceedigs of the 41st Iteratioal Symposium o Computer Architecture (ISCA), Mieapolis, MN, Jue [Slides (pptx) (pdf)] [Lightig Sessio Slides (pptx) (pdf)] [Source Code ad Data] 25

26 Idustry Is Writig Papers About It, Too 26

27 Idustry Is Writig Papers About It, Too 27

28 How Do We Solve The Memory Problem? Fix it: Make memory ad cotrollers more itelliget New iterfaces, fuctios, architectures: system-mem codesig Elimiate or miimize it: Replace or (more likely) augmet DRAM with a differet techology New techologies ad (VM, system-wide OS, MM) rethikig of memory & storage Embrace it: Desig heterogeeous Logic memories (oe of which are perfect) ad map data itelligetly across them Problems Algorithms Programs Rutime System ISA Microarchitecture Devices User New models for data maagemet ad maybe usage Solutios (to memory scalig) reuire software/hardware/device cooperatio 28

29 Ageda Major Treds Affectig Mai Memory The Memory Scalig Problem Two Complemetary Solutio Directios New Memory (DRAM) Architectures Eablig Emergig (NVM) Techologies Coclusio 29

30 Solutio 1: New Memory Architectures Overcome memory shortcomigs with Memory-cetric system desig Novel memory architectures, iterfaces, fuctios Better waste maagemet (efficiet utilizatio) Key issues to tackle Eable reliability at low cost à high capacity Reduce eergy Improve latecy ad badwidth Reduce waste (capacity, badwidth, latecy) Eable computatio close to data 30

31 Solutio 1: New Memory Architectures Liu+, RAIDR: Retetio-Aware Itelliget DRAM Refresh, ISCA Kim+, A Case for Exploitig Subarray-Level Parallelism i DRAM, ISCA Lee+, Tiered-Latecy DRAM: A Low Latecy ad Low Cost DRAM Architecture, HPCA Liu+, A Experimetal Study of Data Retetio Behavior i Moder DRAM Devices, ISCA Seshadri+, RowCloe: Fast ad Efficiet I-DRAM Copy ad Iitializatio of Bulk Data, MICRO Pekhimeko+, Liearly Compressed Pages: A Mai Memory Compressio Framework, MICRO Chag+, Improvig DRAM Performace by Parallelizig Refreshes with Accesses, HPCA Kha+, The Efficacy of Error Mitigatio Techiues for DRAM Retetio Failures: A Comparative Experimetal Study, SIGMETRICS Luo+, Characterizig Applicatio Memory Error Vulerability to Optimize Data Ceter Cost, DSN Kim+, Flippig Bits i Memory Without Accessig Them: A Experimetal Study of DRAM Disturbace Errors, ISCA Lee+, Adaptive-Latecy DRAM: Optimizig DRAM Timig for the Commo-Case, HPCA Qureshi+, AVATAR: A Variable-Retetio-Time (VRT) Aware Refresh for DRAM Systems, DSN Meza+, Revisitig Memory Errors i Large-Scale Productio Data Ceters: Aalysis ad Modelig of New Treds from the Field, DSN Kim+, Ramulator: A Fast ad Extesible DRAM Simulator, IEEE CAL Seshadri+, Fast Bulk Bitwise AND ad OR i DRAM, IEEE CAL Ah+, A Scalable Processig-i-Memory Accelerator for Parallel Graph Processig, ISCA Ah+, PIM-Eabled Istructios: A Low-Overhead, Locality-Aware Processig-i-Memory Architecture, ISCA Lee+, Decoupled Direct Memory Access: Isolatig CPU ad IO Traffic by Leveragig a Dual-Data-Port DRAM, PACT Seshadri+, Gather-Scatter DRAM: I-DRAM Address Traslatio to Improve the Spatial Locality of No-uit Strided Accesses, MICRO Lee+, Simultaeous Multi-Layer Access: Improvig 3D-Stacked Memory Badwidth at Low Cost, TACO Hassa+, ChargeCache: Reducig DRAM Latecy by Exploitig Row Access Locality, HPCA Chag+, Low-Cost Iter-Liked Subarrays (LISA): Eablig Fast Iter-Subarray Data Migratio i DRAM, HPCA Chag+, Uderstadig Latecy Variatio i Moder DRAM Chips Experimetal Characterizatio, Aalysis, ad Optimizatio, SIGMETRICS Kha+, PARBOR: A Efficiet System-Level Techiue to Detect Data Depedet Failures i DRAM, DSN Hsieh+, Trasparet Offloadig ad Mappig (TOM): Eablig Programmer-Trasparet Near-Data Processig i GPU Systems, ISCA Hashemi+, Acceleratig Depedet Cache Misses with a Ehaced Memory Cotroller, ISCA Boroumad+, LazyPIM: A Efficiet Cache Coherece Mechaism for Processig-i-Memory, IEEE CAL Pattaik+, Schedulig Techiues for GPU Architectures with Processig-I-Memory Capabilities, PACT Hsieh+, Acceleratig Poiter Chasig i 3D-Stacked Memory: Challeges, Mechaisms, Evaluatio, ICCD Hashemi+, Cotiuous Ruahead: Trasparet Hardware Acceleratio for Memory Itesive Workloads, MICRO Kha+, A Case for Memory Cotet-Based Detectio ad Mitigatio of Data-Depedet Failures i DRAM", IEEE CAL Hassa+, SoftMC: A Flexible ad Practical Ope-Source Ifrastructure for Eablig Experimetal DRAM Studies, HPCA Mutlu, The RowHammer Problem ad Other Issues We May Face as Memory Becomes Deser, DATE Lee+, Desig-Iduced Latecy Variatio i Moder DRAM Chips: Characterizatio, Aalysis, ad Latecy Reductio Mechaisms, SIGMETRICS Chag+, Uderstadig Reduced-Voltage Operatio i Moder DRAM Devices: Experimetal Characterizatio, Aalysis, ad Mechaisms, SIGMETRICS Patel+, The Reach Profiler (REAPER): Eablig the Mitigatio of DRAM Retetio Failures via Profilig at Aggressive Coditios, ISCA Seshadri ad Mutlu, Simple Operatios i Memory to Reduce Data Movemet, ADCOM Liu+, Cocurret Data Structures for Near-Memory Computig, SPAA Kha+, Detectig ad Mitigatig Data-Depedet DRAM Failures by Exploitig Curret Memory Cotet, MICRO Seshadri+, Ambit: I-Memory Accelerator for Bulk Bitwise Operatios Usig Commodity DRAM Techology, MICRO Avoid DRAM: Seshadri+, The Evicted-Address Filter: A Uified Mechaism to Address Both Cache Pollutio ad Thrashig, PACT Pekhimeko+, Base-Delta-Immediate Compressio: Practical Data Compressio for O-Chip Caches, PACT Seshadri+, The Dirty-Block Idex, ISCA Pekhimeko+, Exploitig Compressed Block Size as a Idicator of Future Reuse, HPCA Vijaykumar+, A Case for Core-Assisted Bottleeck Acceleratio i GPUs: Eablig Flexible Data Compressio with Assist Warps, ISCA Pekhimeko+, Toggle-Aware Badwidth Compressio for GPUs, HPCA

32 Example: (Truly) I-Memory Computatio We ca support i-dram COPY, ZERO, AND, OR, NOT, MAJ At low cost Usig aalog computatio capability of DRAM Idea: activatig multiple rows performs computatio 30-60X performace ad eergy improvemet Seshadri+, Ambit: I-Memory Accelerator for Bulk Bitwise Operatios Usig Commodity DRAM Techology, MICRO New memory techologies eable eve more opportuities Memristors, resistive RAM, phase chage mem, STT-MRAM, Ca operate o data with miimal movemet 32

33 Performace: I-DRAM Bitwise Operatios 33

34 Eergy of I-DRAM Bitwise Operatios Seshadri+, Ambit: I-Memory Accelerator for Bulk Bitwise Operatios usig Commodity DRAM Techology, MICRO

35 More o Ambit Vivek Seshadri et al., Ambit: I-Memory Accelerator for Bulk Bitwise Operatios Usig Commodity DRAM Techology, MICRO

36 Tesseract System for Graph Processig Itercoected set of 3D-stacked memory+logic chips with simple cores Host Processor Memory-Mapped Accelerator Iterface Nocacheable, Physically Addressed) Memory Logic Crossbar Network I-Order Core LP PF Buffer MTP Message Queue DRAM Cotroller NI Ah+, A Scalable Processig-i-Memory Accelerator for Parallel Graph Processig ISCA 2015.

37 Tesseract Graph Processig Performace Speedup >13X Performace Improvemet O five graph processig algorithms 11.6x 9.0x +56% +25% 13.8x 0 DDR3-OoO HMC-OoO HMC-MC Tesseract Tesseract- LP Tesseract- LP-MTP Ah+, A Scalable Processig-i-Memory Accelerator for Parallel Graph Processig ISCA 2015.

38 Tesseract Graph Processig System Eergy 1.2 Memory Layers Logic Layers Cores HMC-OoO > 8X Eergy Reductio Tesseract with Prefetchig Ah+, A Scalable Processig-i-Memory Accelerator for Parallel Graph Processig ISCA 2015.

39 More o Tesseract Juwha Ah, Sugpack Hog, Sugjoo Yoo, Our Mutlu, ad Kiyoug Choi, "A Scalable Processig-i-Memory Accelerator for Parallel Graph Processig" Proceedigs of the 42d Iteratioal Symposium o Computer Architecture (ISCA), Portlad, OR, Jue [Slides (pdf)] [Lightig Sessio Slides (pdf)] 39

40 Acceleratig Poiter Chasig with PIM Kevi Hsieh, Samira Kha, Nadita Vijaykumar, Kevi K. Chag, Amirali Boroumad, Saugata Ghose, ad Our Mutlu, "Acceleratig Poiter Chasig i 3D-Stacked Memory: Challeges, Mechaisms, Evaluatio" Proceedigs of the 34th IEEE Iteratioal Coferece o Computer Desig (ICCD), Phoeix, AZ, USA, October

41 Ageda Major Treds Affectig Mai Memory The Memory Scalig Problem Two Complemetary Solutio Directios New Memory (DRAM) Architectures Eablig Emergig (NVM) Techologies Coclusio 41

42 Limits of Charge Memory Difficult charge placemet ad cotrol Flash: floatig gate charge DRAM: capacitor charge, trasistor leakage Reliable sesig becomes difficult as charge storage uit size reduces 42

43 Solutio 2: Emergig Memory Techologies Some emergig resistive memory techologies seem more scalable tha DRAM (ad they are o-volatile) Example: Phase Chage Memory Data stored by chagig phase of material Data read by detectig material s resistace Expected to scale to 9m (2022 [ITRS 2009]) Prototyped at 20m (Raoux+, IBM JRD 2008) Expected to be deser tha DRAM: ca store multiple bits/cell But, emergig techologies have (may) shortcomigs Ca they be eabled to replace/augmet/surpass DRAM? 43

44 Solutio 2: Emergig Memory Techologies Lee+, Architectig Phase Chage Memory as a Scalable DRAM Alterative, ISCA 09, CACM 10, IEEE Micro 10. Meza+, Eablig Efficiet ad Scalable Hybrid Memories, IEEE Comp. Arch. Letters Yoo, Meza+, Row Buffer Locality Aware Cachig Policies for Hybrid Memories, ICCD Kultursay+, Evaluatig STT-RAM as a Eergy-Efficiet Mai Memory Alterative, ISPASS Meza+, A Case for Efficiet Hardware-Software Cooperative Maagemet of Storage ad Memory, WEED Lu+, Loose Orderig Cosistecy for Persistet Memory, ICCD Zhao+, FIRM: Fair ad High-Performace Memory Cotrol for Persistet Memory Systems, MICRO Yoo, Meza+, Efficiet Data Mappig ad Bufferig Techiues for Multi-Level Cell Phase-Chage Memories, TACO Re+, ThyNVM: Eablig Software-Trasparet Crash Cosistecy i Persistet Memory Systems, MICRO Chauha+, NVMove: Helpig Programmers Move to Byte-Based Persistece, INFLOW Li+, Utility-Based Hybrid Memory Maagemet, CLUSTER Yu+, Bashee: Badwidth-Efficiet DRAM Cachig via Software/Hardware Cooperatio, MICRO

45 Promisig Resistive Memory Techologies PCM Iject curret to chage material phase Resistace determied by phase STT-MRAM Iject curret to chage maget polarity Resistace determied by polarity Memristors/RRAM/ReRAM Iject curret to chage atomic structure Resistace determied by atom distace 45

46 What is Phase Chage Memory? Phase chage material (chalcogeide glass) exists i two states: Amorphous: Low optical reflexivity ad high electrical resistivity Crystallie: High optical reflexivity ad low electrical resistivity PCM is resistive memory: High resistace (0), Low resistace (1) PCM cell ca be switched betwee states reliably ad uickly 46

47 How Does PCM Work? Write: chage phase via curret ijectio SET: sustaied curret to heat cell above Tcryst RESET: cell heated above Tmelt ad ueched Read: detect phase via material resistace amorphous/crystallie Large Curret Small Curret Memory Elemet SET (cryst) Low resistace Access Device RESET (amorph) High resistace W Photo Courtesy: Bipi Rajedra, IBM Slide Courtesy: Moiuddi Qureshi, IBM W 47

48 Opportuity: PCM Advatages Scales better tha DRAM, Flash Reuires curret pulses, which scale liearly with feature size Expected to scale to 9m (2022 [ITRS]) Prototyped at 20m (Raoux+, IBM JRD 2008) Ca be deser tha DRAM Ca store multiple bits per cell due to large resistace rage Prototypes with 2 bits/cell i ISSCC 08, 4 bits/cell by 2012 No-volatile Retai data for >10 years at 85C No refresh eeded, low idle power 48

49 Phase Chage Memory Properties Surveyed prototypes from (ITRS, IEDM, VLSI, ISSCC) Derived PCM parameters for F=90m Lee, Ipek, Mutlu, Burger, Architectig Phase Chage Memory as a Scalable DRAM Alterative, ISCA Lee et al., Phase Chage Techology ad the Future of Mai Memory, IEEE Micro Top Picks

50 50

51 Phase Chage Memory: Pros ad Cos Pros over DRAM Better techology scalig (capacity ad cost) No volatile à Persistet Low idle power (o refresh) Cos Higher latecies: ~4-15x DRAM (especially write) Higher active eergy: ~2-50x DRAM (especially write) Lower edurace (a cell dies after ~10 8 writes) Reliability issues (resistace drift) Challeges i eablig PCM as DRAM replacemet/helper: Mitigate PCM shortcomigs Fid the right way to place PCM i the system 51

52 PCM-based Mai Memory (I) How should PCM-based (mai) memory be orgaized? Hybrid PCM+DRAM [Qureshi+ ISCA 09, Dhima+ DAC 09]: How to partitio/migrate data betwee PCM ad DRAM 52

53 PCM-based Mai Memory (II) How should PCM-based (mai) memory be orgaized? Pure PCM mai memory [Lee et al., ISCA 09, Top Picks 10]: How to redesig etire hierarchy (ad cores) to overcome PCM shortcomigs 53

54 A Iitial Study: Replace DRAM with PCM Lee, Ipek, Mutlu, Burger, Architectig Phase Chage Memory as a Scalable DRAM Alterative, ISCA Surveyed prototypes from (e.g. IEDM, VLSI, ISSCC) Derived average PCM parameters for F=90m 54

55 Results: Naïve Replacemet of DRAM with PCM Replace DRAM with PCM i a 4-core, 4MB L2 system PCM orgaized the same as DRAM: row buffers, baks, peripherals 1.6x delay, 2.2x eergy, 500-hour average lifetime Lee, Ipek, Mutlu, Burger, Architectig Phase Chage Memory as a Scalable DRAM Alterative, ISCA

56 Architectig PCM to Mitigate Shortcomigs Idea 1: Use multiple arrow row buffers i each PCM chip à Reduces array reads/writes à better edurace, latecy, eergy Idea 2: Write ito array at cache block or word graularity à Reduces uecessary wear DRAM PCM 56

57 Results: Architected PCM as Mai Memory 1.2x delay, 1.0x eergy, 5.6-year average lifetime Scalig improves eergy, edurace, desity Caveat 1: Worst-case lifetime is much shorter (o guaratees) Caveat 2: Itesive applicatios see large performace ad eergy hits Caveat 3: Optimistic PCM parameters? 57

58 More o PCM As Mai Memory Bejami C. Lee, Egi Ipek, Our Mutlu, ad Doug Burger, "Architectig Phase Chage Memory as a Scalable DRAM Alterative" Proceedigs of the 36th Iteratioal Symposium o Computer Architecture (ISCA), pages 2-13, Austi, TX, Jue Slides (pdf) 58

59 More o PCM As Mai Memory (II) Bejami C. Lee, Pig Zhou, Ju Yag, Youtao Zhag, Bo Zhao, Egi Ipek, Our Mutlu, ad Doug Burger, "Phase Chage Techology ad the Future of Mai Memory" IEEE Micro, Special Issue: Micro's Top Picks from 2009 Computer Architecture Cofereces (MICRO TOP PICKS), Vol. 30, No. 1, pages 60-70, Jauary/February

60 STT-MRAM as Mai Memory Magetic Tuel Juctio (MTJ) device Referece layer: Fixed magetic orietatio Free layer: Parallel or ati-parallel Magetic orietatio of the free layer determies logical state of device High vs. low resistace Write: Push large curret through MTJ to chage orietatio of free layer Read: Sese curret flow Logical 0 Referece Layer Barrier Free Layer Logical 1 Referece Layer Barrier Free Layer Word Lie MTJ Kultursay et al., Evaluatig STT-RAM as a Eergy- Efficiet Mai Memory Alterative, ISPASS Access Trasistor Bit Lie Sese Lie

61 STT-MRAM: Pros ad Cos Pros over DRAM Better techology scalig (capacity ad cost) No volatile à Persistet Low idle power (o refresh) Cos Higher write latecy Higher write eergy Poor desity (curretly) Reliability? Aother level of freedom Ca trade off o-volatility for lower write latecy/eergy (by reducig the size of the MTJ) 61

62 Architected STT-MRAM as Mai Memory 4-core, 4GB mai memory, multiprogrammed workloads ~6% performace loss, ~60% eergy savigs vs. DRAM Performace vs. DRAM 98% 96% 94% 92% 90% 88% STT-RAM (base) STT-RAM (opt) ACT+PRE WB RB Eergy vs. DRAM 100% 80% 60% 40% 20% 0% Kultursay+, Evaluatig STT-RAM as a Eergy-Efficiet Mai Memory Alterative, ISPASS

63 More o STT-MRAM as Mai Memory Emre Kultursay, Mahmut Kademir, Aad Sivasubramaiam, ad Our Mutlu, "Evaluatig STT-RAM as a Eergy-Efficiet Mai Memory Alterative" Proceedigs of the 2013 IEEE Iteratioal Symposium o Performace Aalysis of Systems ad Software (ISPASS), Austi, TX, April Slides (pptx) (pdf) 63

64 A More Viable Approach: Hybrid Memory Systems CPU DRAM PCM DRAM Ctrl Ctrl Phase Chage Memory (or Tech. X) Fast, durable Small, leaky, volatile, high-cost Large, o-volatile, low-cost Slow, wears out, high active eergy Hardware/software maage data allocatio ad movemet to achieve the best of multiple techologies Meza+, Eablig Efficiet ad Scalable Hybrid Memories, IEEE Comp. Arch. Letters, Yoo+, Row Buffer Locality Aware Cachig Policies for Hybrid Memories, ICCD 2012 Best Paper Award.

65 Challege ad Opportuity Providig the Best of Multiple Metrics with Multiple Memory Techologies 65

66 Challege ad Opportuity Heterogeeous, Cofigurable, Programmable Memory Systems 66

67 Hybrid Memory Systems: Issues Cache vs. Mai Memory Graularity of Data Move/Maage-met: Fie or Coarse Hardware vs. Software vs. HW/SW Cooperative Whe to migrate data? How to desig a scalable ad efficiet large cache? 67

68 Data Placemet i Hybrid Memory Cores/Caches Chael A Memory Cotrollers IDLE Chael B Memory A (Fast, Small) Memory A is fast, but small Load should be balaced o both chaels Memory B (Large, Slow) Page 1 Page 2 Which memory do we place each page i, to maximize system performace? Page migratios have performace ad eergy overhead 68

69 Data Placemet Betwee DRAM ad PCM Idea: Characterize data access patters ad guide data placemet i hybrid memory Streamig accesses: As fast i PCM as i DRAM Radom accesses: Much faster i DRAM Idea: Place radom access data with some reuse i DRAM; streamig data i PCM Yoo+, Row Buffer Locality-Aware Data Placemet i Hybrid Memories, ICCD 2012 Best Paper Award. 69

70 Hybrid vs. All-PCM/DRAM [ICCD 12] Normalized Weighted Speedup GB PCM RBLA-Dy 16GB DRAM 31% 29% Normalized Max. Slowdow % better performace tha all PCM, withi 29% of all DRAM performace Weighted Speedup Max. Slowdow Perf. per Watt 0 Normalized Metric 0 Yoo+, Row Buffer Locality-Aware Data Placemet i Hybrid Memories, ICCD 2012 Best Paper Award. 1

71 More o Hybrid Memory Data Placemet HaBi Yoo, Justi Meza, Rachata Ausavarugiru, Rachael Hardig, ad Our Mutlu, "Row Buffer Locality Aware Cachig Policies for Hybrid Memories" Proceedigs of the 30th IEEE Iteratioal Coferece o Computer Desig (ICCD), Motreal, Quebec, Caada, September Slides (pptx) (pdf) 71

72 Weakesses of Existig Solutios They are all heuristics that cosider oly a limited part of memory access behavior Do ot directly capture the overall system performace impact of data placemet decisios Example: Noe capture memory-level parallelism (MLP) Number of cocurret memory reuests from the same applicatio whe a page is accessed Affects how much page migratio helps performace 72

73 Importace of Memory-Level Parallelism Before migratio: Before migratio: reuests to Page 1 Mem. B reuests to Page 2 Mem. B reuests to Page 3 Mem. B After migratio: After migratio: reuests to Page 1 Mem. A reuests to Page 2 Mem. A T reuests to Page 3 Mem. Mem. A BT time Migratig oe page reduces stall time by T time Must migrate two pages to reduce stall time by T: migratig oe page aloe does ot help Page migratio decisios eed to cosider MLP 73

74 Our Goal [CLUSTER 2017] A geeralized mechaism that 1. Directly estimates the performace beefit of migratig a page betwee ay two types of memory 2. Places oly the performace-critical data i the fast memory 74

75 Utility-Based Hybrid Memory Maagemet A memory maager that works for ay hybrid memory e.g., DRAM-NVM, DRAM-RLDRAM Key Idea For each page, use comprehesive characteristics to calculate estimated utility (i.e., performace impact) of migratig page from oe memory to the other i the system Migrate oly pages with the highest utility (i.e., pages that improve system performace the most whe migrated) Li+, Utility-Based Hybrid Memory Maagemet, CLUSTER

76 Key Mechaisms of UH-MEM For each page, estimate utility usig a performace model Applicatio stall time reductio How much would migratig a page beefit the performace of the applicatio that the page belogs to? Applicatio performace sesitivity How much does the improvemet of a sigle applicatio s performace icrease the overall system performace? Utility = StallTime 0 Sesitivity 0 Migrate oly pages whose utility exceed the migratio threshold from slow memory to fast memory Periodically adjust migratio threshold 76

77 Results: System Performace ALL FREQ RBLA UH-MEM Normalized Weighted Speedup % 9% 3% 5% 0% 25% 50% 75% 100% Workload Memory Itesity Category UH-MEM improves system performace over the best state-of-the-art hybrid memory maager 77

78 Results: Sesitivity to Slow Memory Latecy We vary t 567 ad t 85 of the slow memory Weighted Speedup t 567 : t 85 : ALL FREQ RBLA UH-MEM 8% 6% 14% 13% 13% x3.0 x3.0 x4.0 x4.0 x4.5 x12 x6.0 x16 x7.5 x20 Slow Memory Latecy Multiplier UH-MEM improves system performace for a wide variety of hybrid memory systems 78

79 More o UH-MEM Yag Li, Saugata Ghose, Jogmoo Choi, Ji Su, Hui Wag, ad Our Mutlu, "Utility-Based Hybrid Memory Maagemet" Proceedigs of the 19th IEEE Cluster Coferece (CLUSTER), Hoolulu, Hawaii, USA, September [Slides (pptx) (pdf)] 79

80 Challege ad Opportuity Eablig a Emergig Techology to Augmet DRAM Maagig Hybrid Memories 80

81 Aother Challege Desigig Effective Large (DRAM) Caches 81

82 Oe Problem with Large DRAM Caches A large DRAM cache reuires a large metadata (tag + block-based iformatio) store How do we desig a efficiet DRAM cache? Metadata: X à DRAM Mem Ctlr CPU LOAD X Mem Ctlr DRAM X (small, fast cache) Access X PCM (high capacity) 82

83 Idea 1: Tags i Memory Store tags i the same row as data i DRAM Store metadata i same row as their data Data ad metadata ca be accessed together DRAM row Cache block 0 Cache block 1 Cache block 2 Tag 0 Tag 1 Tag 2 Beefit: No o-chip tag storage overhead Dowsides: Cache hit determied oly after a DRAM access Cache hit reuires two DRAM accesses 83

84 Idea 2: Cache Tags i SRAM Recall Idea 1: Store all metadata i DRAM To reduce metadata storage overhead Idea 2: Cache i o-chip SRAM freuetly-accessed metadata Cache oly a small amout to keep SRAM size small 84

85 O Large DRAM Cache Desig Justi Meza, Jichua Chag, HaBi Yoo, Our Mutlu, ad Parthasarathy Ragaatha, "Eablig Efficiet ad Scalable Hybrid Memories Usig Fie-Graularity DRAM Cache Maagemet" IEEE Computer Architecture Letters (CAL), February

86 DRAM Caches: May Recet Optios Yu+, Bashee: Badwidth-Efficiet DRAM Cachig via Software/Hardware Cooperatio, MICRO

87 Bashee [MICRO 2017] Tracks presece i cache usig TLB ad Page Table No tag store eeded for DRAM cache Eabled by a ew lightweight lazy TLB coherece protocol New badwidth-aware freuecy-based replacemet policy 87

88 More o Bashee Xiagyao Yu, Christopher J. Hughes, Nadathur Satish, Our Mutlu, ad Sriivas Devadas, "Bashee: Badwidth-Efficiet DRAM Cachig via Software/Hardware Cooperatio" Proceedigs of the 50th Iteratioal Symposium o Microarchitecture (MICRO), Bosto, MA, USA, October

89 Other Opportuities with Emergig Techologies Mergig of memory ad storage e.g., a sigle iterface to maage all data New applicatios e.g., ultra-fast checkpoit ad restore More robust system desig e.g., reducig data loss Processig tightly-coupled with memory e.g., eablig efficiet search ad filterig 89

90 TWO-LEVEL STORAGE MODEL MEMORY CPU STORAGE Ld/St DRAM FILE I/O VOLATILE FAST BYTE ADDR NONVOLATILE SLOW BLOCK ADDR 90

91 TWO-LEVEL STORAGE MODEL MEMORY STORAGE CPU Ld/St DRAM FILE I/O NVM PCM, STT-RAM VOLATILE FAST BYTE ADDR NONVOLATILE SLOW BLOCK ADDR No-volatile memories combie characteristics of memory ad storage 91

92 Two-Level Memory/Storage Model The traditioal two-level storage model is a bottleeck with NVM Volatile data i memory à a load/store iterface Persistet data i storage à a file system iterface Problem: Operatig system (OS) ad file system (FS) code to locate, traslate, buffer data become performace ad eergy bottleecks with fast NVM stores Two-Level Store Virtual memory Load/Store fope, fread, fwrite, Operatig system ad file system Address traslatio Processor ad caches Mai Memory Persistet (e.g., Phase-Chage) Storage Memory (SSD/HDD) 92

93 Uified Memory ad Storage with NVM Goal: Uify memory ad storage maagemet i a sigle uit to elimiate wasted work to locate, trasfer, ad traslate data Improves both eergy ad performace Simplifies programmig model as well Uified Memory/Storage Persistet Memory Maager Processor ad caches Load/Store Feedback Persistet (e.g., Phase-Chage) Memory Meza+, A Case for Efficiet Hardware-Software Cooperative Maagemet of Storage ad Memory, WEED

94 The Persistet Memory Maager (PMM) 1 it mai(void) { 2 // data i file.dat is persistet 3 FILE mydata = "file.dat"; Persistet objects 4 mydata = ew it[64]; 5 } 6 void updatevalue(it, it value) { 7 FILE mydata = "file.dat"; 8 mydata[] = value; // value is persistet 9 } Load Figure 2: Sample program with access to fi Store Hits from SW/OS/rutime Software Hardware Persistet Memory Maager Data Layout, Persistece, Metadata, Security,... DRAM Flash NVM HDD PMM uses access ad hit iformatio to allocate, locate, migrate ad access data i the heterogeeous array of devices 94

95 Performace Beefits of a Sigle-Level Store User CPU User Memory Syscall CPU Syscall I/O 1.0 ~24X Normalized Executio Time ~5X HDD 2-level NVM 2-level Persistet Memory Meza+, A Case for Efficiet Hardware-Software Cooperative Maagemet of Storage ad Memory, WEED

96 Eergy Beefits of a Sigle-Level Store User CPU Syscall CPU DRAM NVM HDD 1.0 ~16X Fractio of Total Eergy ~5X HDD 2-level NVM 2-level Persistet Memory Meza+, A Case for Efficiet Hardware-Software Cooperative Maagemet of Storage ad Memory, WEED

97 O Persistet Memory Beefits & Challeges Justi Meza, Yixi Luo, Samira Kha, Jishe Zhao, Yua Xie, ad Our Mutlu, "A Case for Efficiet Hardware-Software Cooperative Maagemet of Storage ad Memory" Proceedigs of the 5th Workshop o Eergy-Efficiet Desig (WEED), Tel-Aviv, Israel, Jue Slides (pptx) Slides (pdf) 97

98 Challege ad Opportuity Combied Memory & Storage 98

99 Challege ad Opportuity A Uified Iterface to All Data 99

100 Oe Key Challege i Persistet Memory How to esure cosistecy of system/data if all memory is persistet? Two extremes Programmer trasparet: Let the system hadle it Programmer oly: Let the programmer hadle it May alteratives i-betwee 100

101 CHALLENGE: CRASH CONSISTENCY Persistet Memory System System crash ca result i permaet data corruptio i NVM 101

102 CRASH CONSISTENCY PROBLEM Example: Add a ode to a liked list 2. Lik to prev 1. Lik to ext System crash ca result i icosistet memory state 102

103 CURRENT SOLUTIONS Explicit iterfaces to maage cosistecy NV-Heaps [ASPLOS 11], BPFS [SOSP 09], Memosye [ASPLOS 11] AtomicBegi { Isert a ew ode; } AtomicEd; Limits adoptio of NVM Have to rewrite code with clear partitio betwee volatile ad o-volatile data Burde o the programmers 103

104 OUR APPROACH: ThyNVM Goal: Software trasparet cosistecy i persistet memory systems 104

105 ThyNVM: Summary A ew hardware-based checkpoitig mechaism Checkpoits at multiple graularities to reduce both checkpoitig latecy ad metadata overhead Overlaps checkpoitig ad executio to reduce checkpoitig latecy Adapts to DRAM ad NVM characteristics Performs withi 4.9% of a idealized DRAM with zero cost cosistecy 105

106 More About ThyNVM Jiglei Re, Jishe Zhao, Samira Kha, Jogmoo Choi, Yogwei Wu, ad Our Mutlu, "ThyNVM: Eablig Software-Trasparet Crash Cosistecy i Persistet Memory Systems" Proceedigs of the 48th Iteratioal Symposium o Microarchitecture (MICRO), Waikiki, Hawaii, USA, December [Slides (pptx) (pdf)] [Lightig Sessio Slides (pptx) (pdf)] [Poster (pptx) (pdf)] [Source Code] 106

107 Aother Key Challege i Persistet Memory Programmig Ease to Exploit Persistece 107

108 Tools/Libraries to Help Programmers Himashu Chauha, Iria Calciu, Vijay Chidambaram, Eric Schkufza, Our Mutlu, ad Pratap Subrahmayam, "NVMove: Helpig Programmers Move to Byte-Based Persistece" Proceedigs of the 4th Workshop o Iteractios of NVM/Flash with Operatig Systems ad Workloads (INFLOW), Savaah, GA, USA, November [Slides (pptx) (pdf)] 108

109 Ageda Major Treds Affectig Mai Memory The Memory Scalig Problem Two Complemetary Solutio Directios New Memory (DRAM) Architectures Eablig Emergig (NVM) Techologies Coclusio 109

110 The Future of Emergig Techologies is Bright Regardless of challeges i uderlyig techology ad overlyig problems/reuiremets Ca eable: - Orders of magitude improvemets - New applicatios ad computig systems Problem Algorithm Program/Laguage System Software SW/HW Iterface Micro-architecture Logic Devices Electros Yet, we have to - Thik across the stack - Desig eablig systems 110

111 If I Doubt, Refer to Flash Memory A very doubtful emergig techology for at least two decades Proceedigs of the IEEE, Sept

112 Opportuities ad Challeges of Emergig Memory Techologies Our Mutlu September 11, 2017 ARM Research Summit

113 Overview Paper o Flash Reliability Yu Cai, Saugata Ghose, Erich F. Haratsch, Yixi Luo, ad Our Mutlu, "Error Characterizatio, Mitigatio, ad Recovery i Flash Memory Based Solid State Drives" to appear i Proceedigs of the IEEE,

114 Solutio 2: Emergig Memory Techologies Some emergig resistive memory techologies seem more scalable tha DRAM (ad they are o-volatile) Example: Phase Chage Memory Expected to scale to 9m (2022 [ITRS]) Expected to be deser tha DRAM: ca store multiple bits/cell But, emergig techologies have shortcomigs as well Ca they be eabled to replace/augmet/surpass DRAM? Lee+, Architectig Phase Chage Memory as a Scalable DRAM Alterative, ISCA 09, CACM 10, IEEE Micro 10. Meza+, Eablig Efficiet ad Scalable Hybrid Memories, IEEE Comp. Arch. Letters Yoo, Meza+, Row Buffer Locality Aware Cachig Policies for Hybrid Memories, ICCD Kultursay+, Evaluatig STT-RAM as a Eergy-Efficiet Mai Memory Alterative, ISPASS Meza+, A Case for Efficiet Hardware-Software Cooperative Maagemet of Storage ad Memory, WEED Lu+, Loose Orderig Cosistecy for Persistet Memory, ICCD Zhao+, FIRM: Fair ad High-Performace Memory Cotrol for Persistet Memory Systems, MICRO Yoo, Meza+, Efficiet Data Mappig ad Bufferig Techiues for Multi-Level Cell Phase-Chage Memories, TACO Re+, ThyNVM: Eablig Software-Trasparet Crash Cosistecy i Persistet Memory Systems, MICRO Chauha+, NVMove: Helpig Programmers Move to Byte-Based Persistece, INFLOW Li+, Utility-Based Hybrid Memory Maagemet, CLUSTER Yu+, Bashee: Badwidth-Efficiet DRAM Cachig via Software/Hardware Cooperatio, MICRO

115 Combiatio: Hybrid Memory Systems DRAM Fast, durable Small, leaky, volatile, high-cost DRAM Ctrl CPU PCM Ctrl Techology X (e.g., PCM) Large, o-volatile, low-cost Slow, wears out, high active eergy Hardware/software maage data allocatio ad movemet to achieve the best of multiple techologies Meza+, Eablig Efficiet ad Scalable Hybrid Memories, IEEE Comp. Arch. Letters, Yoo, Meza et al., Row Buffer Locality Aware Cachig Policies for Hybrid Memories, ICCD 2012 Best Paper Award.

116 Exploitig Memory Error Tolerace with Hybrid Memory Systems Memory error vulerability Vulerable Tolerat data data Reliable memory Low-cost memory ECC Vulerable O protected Microsoft s Web Search NoECCworkload Tolerat Parity Reduces Well-tested data server hardware chips cost Less-tested by 4.7 % data chips Achieves sigle server availability target of % App/Data A App/Data B App/Data C Heterogeeous-Reliability Memory [DSN 2014] 116

117 More o Heterogeeous Reliability Memory Yixi Luo, Sriram Govida, Bikash Sharma, Mark Sataiello, Justi Meza, Ama Kasal, Jie Liu, Badriddie Khessib, Kushagra Vaid, ad Our Mutlu, "Characterizig Applicatio Memory Error Vulerability to Optimize Data Ceter Cost via Heterogeeous-Reliability Memory" Proceedigs of the 44th Aual IEEE/IFIP Iteratioal Coferece o Depedable Systems ad Networks (DSN), Atlata, GA, Jue [Summary] [Slides (pptx) (pdf)] [Coverage o ZDNet] 117

118 Challege ad Opportuity Providig the Best of Multiple Metrics with Multiple Memory Techologies 118

119 Challege ad Opportuity Heterogeeous, Cofigurable, Programmable Memory Systems 119

UH-MEM: Utility-Based Hybrid Memory Management. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, Onur Mutlu

UH-MEM: Utility-Based Hybrid Memory Management. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, Onur Mutlu UH-MEM: Utility-Based Hybrid Memory Maagemet Yag Li, Saugata Ghose, Jogmoo Choi, Ji Su, Hui Wag, Our Mutlu 1 Executive Summary DRAM faces sigificat techology scalig difficulties Emergig memory techologies

More information

Scalable Many-Core Memory Systems Lecture 3, Topic 2: Emerging Technologies and Hybrid Memories

Scalable Many-Core Memory Systems Lecture 3, Topic 2: Emerging Technologies and Hybrid Memories Scalable Many-Core Memory Systems Lecture 3, Topic 2: Emerging Technologies and Hybrid Memories Prof. Onur Mutlu http://www.ece.cmu.edu/~omutlu onur@cmu.edu HiPEAC ACACES Summer School 2013 July 17, 2013

More information

18-740: Computer Architecture Recitation 5: Main Memory Scaling Wrap-Up. Prof. Onur Mutlu Carnegie Mellon University Fall 2015 September 29, 2015

18-740: Computer Architecture Recitation 5: Main Memory Scaling Wrap-Up. Prof. Onur Mutlu Carnegie Mellon University Fall 2015 September 29, 2015 18-740: Computer Architecture Recitation 5: Main Memory Scaling Wrap-Up Prof. Onur Mutlu Carnegie Mellon University Fall 2015 September 29, 2015 Review Assignments for Next Week Required Reviews Due Tuesday

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access

More information

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Course Site:   Copyright 2012, Elsevier Inc. All rights reserved. Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1 Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts

More information

Phase Change Memory An Architecture and Systems Perspective

Phase Change Memory An Architecture and Systems Perspective Phase Change Memory An Architecture and Systems Perspective Benjamin C. Lee Stanford University bcclee@stanford.edu Fall 2010, Assistant Professor @ Duke University Benjamin C. Lee 1 Memory Scaling density,

More information

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,

More information

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition. Computer Architecture A Quatitative Approach, Sixth Editio Chapter 2 Memory Hierarchy Desig 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

Multiprocessors. HPC Prof. Robert van Engelen

Multiprocessors. HPC Prof. Robert van Engelen Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies

More information

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS Samira Khan MEMORY IN TODAY S SYSTEM Processor DRAM Memory Storage DRAM is critical for performance

More information

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The

More information

CS2410 Computer Architecture. Flynn s Taxonomy

CS2410 Computer Architecture. Flynn s Taxonomy CS2410 Computer Architecture Dept. of Computer Sciece Uiversity of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/2410p/idex.html 1 Fly s Taxoomy SISD Sigle istructio stream Sigle data stream (SIMD)

More information

18-740: Computer Architecture Recitation 4: Rethinking Memory System Design. Prof. Onur Mutlu Carnegie Mellon University Fall 2015 September 22, 2015

18-740: Computer Architecture Recitation 4: Rethinking Memory System Design. Prof. Onur Mutlu Carnegie Mellon University Fall 2015 September 22, 2015 18-740: Computer Architecture Recitation 4: Rethinking Memory System Design Prof. Onur Mutlu Carnegie Mellon University Fall 2015 September 22, 2015 Agenda Review Assignments for Next Week Rethinking Memory

More information

Transforming Irregular Algorithms for Heterogeneous Computing - Case Studies in Bioinformatics

Transforming Irregular Algorithms for Heterogeneous Computing - Case Studies in Bioinformatics Trasformig Irregular lgorithms for Heterogeeous omputig - ase Studies i ioiformatics Jig Zhag dvisor: Dr. Wu Feg ollaborator: Hao Wag syergy.cs.vt.edu Irregular lgorithms haracterized by Operate o irregular

More information

Phase Change Memory An Architecture and Systems Perspective

Phase Change Memory An Architecture and Systems Perspective Phase Change Memory An Architecture and Systems Perspective Benjamin Lee Electrical Engineering Stanford University Stanford EE382 2 December 2009 Benjamin Lee 1 :: PCM :: 2 Dec 09 Memory Scaling density,

More information

Service Oriented Enterprise Architecture and Service Oriented Enterprise

Service Oriented Enterprise Architecture and Service Oriented Enterprise Approved for Public Release Distributio Ulimited Case Number: 09-2786 The 23 rd Ope Group Eterprise Practitioers Coferece Service Orieted Eterprise ad Service Orieted Eterprise Ya Zhao, PhD Pricipal, MITRE

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device

More information

n Explore virtualization concepts n Become familiar with cloud concepts

n Explore virtualization concepts n Become familiar with cloud concepts Chapter Objectives Explore virtualizatio cocepts Become familiar with cloud cocepts Chapter #15: Architecture ad Desig 2 Hypervisor Virtualizatio ad cloud services are becomig commo eterprise tools to

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware A Overview Graphics System Moitor Iput devices CPU/Memory GPU Raster Graphics System Raster: A array of picture elemets Based o raster-sca TV techology The scree (ad a picture)

More information

Operating System Concepts. Operating System Concepts

Operating System Concepts. Operating System Concepts Chapter 4: Mass-Storage Systems Logical Disk Structure Logical Disk Structure Disk Schedulig Disk Maagemet RAID Structure Disk drives are addressed as large -dimesioal arrays of logical blocks, where the

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 CPU-Memory Bottleeck Computer Architecture ELEC44 CPU Memory Lecture 8 Cache Dr. Hayde Kwok-Hay So Departmet of Electrical ad Electroic Egieerig Performace of high-speed computers is usually limited by

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local

More information

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000. 5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator

More information

SCI Reflective Memory

SCI Reflective Memory Embedded SCI Solutios SCI Reflective Memory (Experimetal) Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o

More information

Big Data Capacity Planning: Achieving Right Sized Hadoop Clusters and Optimized Operations

Big Data Capacity Planning: Achieving Right Sized Hadoop Clusters and Optimized Operations Big Data Capacity Plaig: Achievig Right Sized Hadoop Clusters ad Optimized Operatios Abstract Busiesses are cosiderig more opportuities to leverage data for differet purposes, impactig resources ad resultig

More information

Session Initiated Protocol (SIP) and Message-based Load Balancing (MBLB)

Session Initiated Protocol (SIP) and Message-based Load Balancing (MBLB) F5 White Paper Sessio Iitiated Protocol (SIP) ad Message-based Load Balacig (MBLB) The ability to provide ew ad creative methods of commuicatios has esured a SIP presece i almost every orgaizatio. The

More information

Outline. CSCI 4730 Operating Systems. Questions. What is an Operating System? Computer System Layers. Computer System Layers

Outline. CSCI 4730 Operating Systems. Questions. What is an Operating System? Computer System Layers. Computer System Layers Outlie CSCI 4730 s! What is a s?!! System Compoet Architecture s Overview Questios What is a?! What are the major operatig system compoets?! What are basic computer system orgaizatios?! How do you commuicate

More information

Reducing SDRAM Energy Consumption in Embedded Systems Λ

Reducing SDRAM Energy Consumption in Embedded Systems Λ Reducig SDRAM Eergy Cosumptio i Embedded Systems Λ Jelea Trajkovic, Alexader Veidebaum Uiversity of Califoria, Irvie fjeleat,alexvg@ics.uci.edu Abstract DRAM eergy cosumptio i embedded systems ca be very

More information

Review: The ACID properties

Review: The ACID properties Recovery Review: The ACID properties A tomicity: All actios i the Xactio happe, or oe happe. C osistecy: If each Xactio is cosistet, ad the DB starts cosistet, it eds up cosistet. I solatio: Executio of

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must

More information

Isn t It Time You Got Faster, Quicker?

Isn t It Time You Got Faster, Quicker? Is t It Time You Got Faster, Quicker? AltiVec Techology At-a-Glace OVERVIEW Motorola s advaced AltiVec techology is desiged to eable host processors compatible with the PowerPC istructio-set architecture

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

Avid Interplay Bundle

Avid Interplay Bundle Avid Iterplay Budle Versio 2.5 Cofigurator ReadMe Overview This documet provides a overview of Iterplay Budle v2.5 ad describes how to ru the Iterplay Budle cofiguratio tool. Iterplay Budle v2.5 refers

More information

Rethinking Memory System Design Business as Usual in the Next Decade?

Rethinking Memory System Design Business as Usual in the Next Decade? Rethinking Memory System Design Business as Usual in the Next Decade? Onur Mutlu onur.mutlu@inf.ethz.ch http://users.ece.cmu.edu/~omutlu/ September 23, 2016 MST Workshop Keynote (Milan) The Main Memory

More information

Computer Architecture Lecture 12: Memory Interference and Quality of Service. Prof. Onur Mutlu ETH Zürich Fall November 2017

Computer Architecture Lecture 12: Memory Interference and Quality of Service. Prof. Onur Mutlu ETH Zürich Fall November 2017 Computer Architecture Lecture 12: Memory Iterferece ad Quality of Service Prof. Our Mutlu ETH Zürich Fall 2017 1 November 2017 Summary of Last Week s Lectures Cotrol Depedece Hadlig Problem Six solutios

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 22 Database Recovery Techiques Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Recovery algorithms Recovery cocepts Write-ahead

More information

n Learn how resiliency strategies reduce risk n Discover automation strategies to reduce risk

n Learn how resiliency strategies reduce risk n Discover automation strategies to reduce risk Chapter Objectives Lear how resiliecy strategies reduce risk Discover automatio strategies to reduce risk Chapter #16: Architecture ad Desig Resiliecy ad Automatio Strategies 2 Automatio/Scriptig Resiliet

More information

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised

More information

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory! Why Care About the Memory Hierarchy? Memory Virtual Memory -DRAM Memory Gap (latecy) Reasos: Multi process systems (abstractio & memory protectio) Solutio: Tables (holdig per process traslatios) Fast traslatio

More information

Data Protection: Your Choice Is Simple PARTNER LOGO

Data Protection: Your Choice Is Simple PARTNER LOGO Data Protectio: Your Choice Is Simple PARTNER LOGO Is Your Data Truly Protected? The growth, value ad mobility of data are placig icreasig pressure o orgaizatios. IT must esure assets are properly protected

More information

Instruction and Data Streams

Instruction and Data Streams Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 5: Pipeliig Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab1 Due toight Lab2: out later today; due 2 weeks from ow Review sessio this Friday Turig award

More information

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Overview Emerging memories such as PCM offer higher density than

More information

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1 Switchig Hardware Sprig 208 CS 438 Staff, Uiversity of Illiois Where are we? Uderstad Differet ways to move through a etwork (forwardig) Read sigs at each switch (datagram) Follow a kow path (virtual circuit)

More information

Rethinking Memory System Design for Data-Intensive Computing. Onur Mutlu October 11-16, 2013

Rethinking Memory System Design for Data-Intensive Computing. Onur Mutlu October 11-16, 2013 Rethinking Memory System Design for Data-Intensive Computing Onur Mutlu onur@cmu.edu October 11-16, 2013 Three Key Problems in Systems The memory system Data storage and movement limit performance & efficiency

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS SIAM J. SCI. COMPUT. Vol. 22, No. 6, pp. 2113 2134 c 21 Society for Idustrial ad Applied Mathematics FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS ZHAO ZHANG AND XIAODONG ZHANG

More information

CS61C : Machine Structures

CS61C : Machine Structures CS 61C L24 VM II (1) ist.eecs.berkele.edu/~cs61c/su5 CS61C : Machie Structures Lecture #24: VM II Address Mappig: Virtual Address: VPN offset 25-8-2 Ad Carle idex ito page table located i phsical memor

More information

DCMIX: Generating Mixed Workloads for the Cloud Data Center

DCMIX: Generating Mixed Workloads for the Cloud Data Center DCMIX: Geeratig Mixed Workloads for the Cloud Data Ceter XigWag Xiog, Lei Wag, WaLig Gao, Rui Re, Ke Liu, Che Zheg, Yu We, YiLiag Istitute of Computig Techology, Chiese Academy of Scieces Bech 2018, Seattle,

More information

Computer Architecture

Computer Architecture Computer Architecture Overview Prof. Tie-Fu Che Dept. of Computer Sciece Natioal Chug Cheg Uiv Sprig 2002 Overview- Computer Architecture Course Focus Uderstadig the desig techiques, machie structures,

More information

Uniprocessors. HPC Prof. Robert van Engelen

Uniprocessors. HPC Prof. Robert van Engelen Uiprocessors HPC Prof. Robert va Egele Overview PART I: Uiprocessors PART II: Multiprocessors ad ad Compiler Optimizatios Parallel Programmig Models Uiprocessors Multiprocessors Processor architectures

More information

U8 Flash Memory Controller

U8 Flash Memory Controller U8 Flash Memory Cotroller U8 U8 Flash Memory Cotroller The Hyperstoe U8 family of Flash Memory Cotrollers together with provided applicatio ad Flash specific firmware offers a easy-to-use turkey platform

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

Scalable Many-Core Memory Systems Topic 2: Emerging Technologies and Hybrid Memories

Scalable Many-Core Memory Systems Topic 2: Emerging Technologies and Hybrid Memories Scalable Many-Core Memory Systems Topic 2: Emerging Technologies and Hybrid Memories Prof. Onur Mutlu http://www.ece.cmu.edu/~omutlu onur@cmu.edu HiPEAC ACACES Summer School 2013 July 15-19, 2013 What

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

Phase-change RAM (PRAM)- based Main Memory

Phase-change RAM (PRAM)- based Main Memory Phase-change RAM (PRAM)- based Main Memory Sungjoo Yoo April 19, 2011 Embedded System Architecture Lab. POSTECH sungjoo.yoo@gmail.com Agenda Introduction Current status Hybrid PRAM/DRAM main memory Next

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Goals of the Lecture UML Implementation Diagrams

Goals of the Lecture UML Implementation Diagrams Goals of the Lecture UML Implemetatio Diagrams Object-Orieted Aalysis ad Desig - Fall 1998 Preset UML Diagrams useful for implemetatio Provide examples Next Lecture Ð A variety of topics o mappig from

More information

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University

More information

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.

More information

Cache-Optimal Methods for Bit-Reversals

Cache-Optimal Methods for Bit-Reversals Proceedigs of the ACM/IEEE Supercomputig Coferece, November 1999, Portlad, Orego, U.S.A. Cache-Optimal Methods for Bit-Reversals Zhao Zhag ad Xiaodog Zhag Departmet of Computer Sciece College of William

More information

Trusted Design in FPGAs

Trusted Design in FPGAs Trusted Desig i FPGAs Mark Tehraipoor Itroductio to Hardware Security & Trust Uiversity of Florida 1 Outlie Itro to FPGA Architecture FPGA Overview Maufacturig Flow FPGA Security Attacks Defeses Curret

More information

Design of Digital Circuits Lecture 22: GPU Programming. Dr. Juan Gómez Luna Prof. Onur Mutlu ETH Zurich Spring May 2018

Design of Digital Circuits Lecture 22: GPU Programming. Dr. Juan Gómez Luna Prof. Onur Mutlu ETH Zurich Spring May 2018 Desig of Digital Circuits Lecture 22: GPU Programmig Dr. Jua Gómez Lua Prof. Our Mutlu ETH Zurich Sprig 2018 18 May 2018 Ageda for Today GPU as a accelerator Program structure Bulk sychroous programmig

More information

IT administrators face a variety of challenges

IT administrators face a variety of challenges Flexible Computig Optimized Desktop Ifrastructure Usig Dell Flexible Computig Solutios By Joh Schoute Ramesh Radhakrisha, Ph.D. The Dell Flexible Computig Solutios suite of products ad services is desiged

More information

Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University Main Memory Lectures These slides are from the Scalable Memory Systems course taught at ACACES 2013 (July 15-19,

More information

ECE4050 Data Structures and Algorithms. Lecture 6: Searching

ECE4050 Data Structures and Algorithms. Lecture 6: Searching ECE4050 Data Structures ad Algorithms Lecture 6: Searchig 1 Search Give: Distict keys k 1, k 2,, k ad collectio L of records of the form (k 1, I 1 ), (k 2, I 2 ),, (k, I ) where I j is the iformatio associated

More information

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1 COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,

More information

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network Available olie at www.sciecedirect.com Eergy Procedia 6 (202) 60 64 202 Iteratioal Coferece o Future Eergy, Eviromet, ad Materials Adaptive Resource Allocatio for Electric Evirometal Pollutio through the

More information

Chapter 4 The Datapath

Chapter 4 The Datapath The Ageda Chapter 4 The Datapath Based o slides McGraw-Hill Additioal material 24/25/26 Lewis/Marti Additioal material 28 Roth Additioal material 2 Taylor Additioal material 2 Farmer Tae the elemets that

More information

1. SWITCHING FUNDAMENTALS

1. SWITCHING FUNDAMENTALS . SWITCING FUNDMENTLS Switchig is the provisio of a o-demad coectio betwee two ed poits. Two distict switchig techiques are employed i commuicatio etwors-- circuit switchig ad pacet switchig. Circuit switchig

More information

Stone Images Retrieval Based on Color Histogram

Stone Images Retrieval Based on Color Histogram Stoe Images Retrieval Based o Color Histogram Qiag Zhao, Jie Yag, Jigyi Yag, Hogxig Liu School of Iformatio Egieerig, Wuha Uiversity of Techology Wuha, Chia Abstract Stoe images color features are chose

More information

A Parallel Reconfigurable Architecture for Real-Time Stereo Vision

A Parallel Reconfigurable Architecture for Real-Time Stereo Vision 2009 Iteratioal Cofereces o Embedded Software ad Systems A Parallel Recofigurable Architecture for Real-Time Stereo Visio Lei Che Yude Jia Beijig Laboratory of Itelliget Iformatio Techology, School of

More information

Efficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO

Efficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO Efficiet Hardware Desig for Implemetatio of Matrix Multiplicatio by usig PPI-SO Shivagi Tiwari, Niti Meea Dept. of EC, IES College of Techology, Bhopal, Idia Assistat Professor, Dept. of EC, IES College

More information

1 Enterprise Modeler

1 Enterprise Modeler 1 Eterprise Modeler Itroductio I BaaERP, a Busiess Cotrol Model ad a Eterprise Structure Model for multi-site cofiguratios are itroduced. Eterprise Structure Model Busiess Cotrol Models Busiess Fuctio

More information

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods.

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods. Software developmet of compoets for complex sigal aalysis o the example of adaptive recursive estimatio methods. SIMON BOYMANN, RALPH MASCHOTTA, SILKE LEHMANN, DUNJA STEUER Istitute of Biomedical Egieerig

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

What are Information Systems?

What are Information Systems? Iformatio Systems Cocepts What are Iformatio Systems? Roma Kotchakov Birkbeck, Uiversity of Lodo Based o Chapter 1 of Beett, McRobb ad Farmer: Object Orieted Systems Aalysis ad Desig Usig UML, (4th Editio),

More information

Scalable High Performance Main Memory System Using PCM Technology

Scalable High Performance Main Memory System Using PCM Technology Scalable High Performance Main Memory System Using PCM Technology Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers IBM T. J. Watson Research Center, Yorktown Heights, NY International Symposium on

More information

Architectural styles for software systems The client-server style

Architectural styles for software systems The client-server style Architectural styles for software systems The cliet-server style Prof. Paolo Ciacarii Software Architecture CdL M Iformatica Uiversità di Bologa Ageda Cliet server style CS two tiers CS three tiers CS

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yajig Li Departmet of Computer Sciece Uiversity of Chicago Admiistrative Stuff Lab1 out toight Due Thursday (10/18) Lab1 review sessio Tomorrow, 10/05,

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Large ad Fast: Exploitig Memory Hierarchy Priciple of Locality Programs access a small proportio of their address space

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface ARM Editio Chapter 5 Large ad Fast: Exploitig Memory Hierarchy Priciple of Locality Programs access a small proportio of their address space

More information

Computers and Scientific Thinking

Computers and Scientific Thinking Computers ad Scietific Thikig David Reed, Creighto Uiversity Chapter 15 JavaScript Strigs 1 Strigs as Objects so far, your iteractive Web pages have maipulated strigs i simple ways use text box to iput

More information

The Value of Peering

The Value of Peering The Value of Peerig ISP/IXP Workshops These materials are licesed uder the Creative Commos Attributio-NoCommercial 4.0 Iteratioal licese (http://creativecommos.org/liceses/by-c/4.0/) Last updated 25 th

More information

1&1 Next Level Hosting

1&1 Next Level Hosting 1&1 Next Level Hostig Performace Level: Performace that grows with your requiremets Copyright 1&1 Iteret SE 2017 1ad1.com 2 1&1 NEXT LEVEL HOSTING 3 Fast page loadig ad short respose times play importat

More information

CSE 305. Computer Architecture

CSE 305. Computer Architecture CSE 305 Computer Architecture Computer Architecture Course Teachers Rifat Shahriyar (rifat1816@gmail.com) Johra Muhammad Moosa Textbook Computer Orgaizatio ad Desig (The Hardware/Software Iterface) David

More information

Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation!

Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Xiangyao Yu 1, Christopher Hughes 2, Nadathur Satish 2, Onur Mutlu 3, Srinivas Devadas 1 1 MIT 2 Intel Labs 3 ETH Zürich 1 High-Bandwidth

More information

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Executive Summary Different memory technologies have different

More information

CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 15: Multi-Core Prof. Yajig Li Uiversity of Chicago Course Evaluatio Very importat Please fill out! 2 Lab3 Brach Predictio Competitio 8 teams etered the competitio,

More information