Don t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration

Size: px
Start display at page:

Download "Don t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration"

Transcription

1 Don t Forget the : Automatic Block RAM Modelling, Optimization, and Architecture Exploration S. Yazdanshenas, K. Tatsumura *, and V. Betz University of Toronto, Canada * Toshiba Corporation, Japan

2 : An Indispensable FPGA Component Logic Blocks BRAM DDR Controller DDR DDR DDR 2

3 : An Indispensable FPGA Component DDR Controller DDR DDR DDR 3

4 : An Indispensable FPGA Component ~1000x Lower Bandwidth ~100x Higher Access Energy ~10x Higher Latency DDR Controller DDR DDR DDR 4

5 : An Indispensable FPGA Component BRAM growing in importance Many applications (search engine, CNNs, ) BRAM-intensive Can t fully utilize FPGA s computation capacity without on-chip memory RAM bit/le Xilinx kb Virtex 220nm Richness Increase 18kb 18kb + 288kb Ultrascale+ 16nm 16nm 20nm 28nm 40nm 65nm 90nm 150nm 220nm 0 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 Number of Logic Elements (4-input-LUT equivalent) 5

6 BRAM s Evolution -richness growth Organization changes 6

7 The Key Point BRAMs can t be neglected! ~25% of area Should respond to application demands Need BRAM models How efficient is an architecture? What s the best architecture? Set of BRAM Models Set of Benchmark Circuits VTR The best Architecture 7

8 BRAM design is Difficult BRAM design is challenging! Analog nature of some components Sense Amplifier 8

9 BRAM design is Difficult BRAM design is challenging! Analog nature of some components Variability of memory cells 9

10 BRAM design is Difficult BRAM design is challenging! Analog nature of some components Variability of memory cells Custom layout style Significant FPGA-specific peripheral circuitry Hand design of each candidate BRAM infeasible 10

11 Use existing tools? 11

12 BRAM design is Different CACTI underestimates area ~1.9x ~2.8x 12

13 BRAM design is Different Also underestimates read energy ~5.5x ~8.1x 13

14 BRAM design is Different Overestimates operating frequency ~4.7x ~3.9x 14

15 Emerging Technologies Model promising emerging memory technologies Magnetic Tunnel Junction (MTJ) Phase Change (PCM) Resistive RAM (RRAM) Ideal: model any technology with SPICE support Magnetic Tunnel Junction (MTJ) Parallel Free Layer Insulator Fixed Layer Low resistance state (LRS) Anti-parallel High resistance state (HRS) 15

16 BRAM Design Tool 16

17 What do we need? Process Model Architecture Specifications Automatic Transistor Sizing Tool VTR Architecture File Detailed Sizing Solution 17

18 COFFE: Logic & Routing C. Chiasson et al. "COFFE: Fully-Automated Transistor Sizing for FPGAs," FPT

19 COFFE BRAM Flow Process Model Spice Netlist Generation Individual Sub- Circuits Architecture Specifications Top Level Evaluation Paths Update FPGA Area, Wires, Wire RC Size All Components (Involves updating area, delay,...) No Finished Sizing Iterations? Yes Measure RAM Core Power To VTR Generate VTR Architecture File 19

20 2-Bank SRAM-Based BRAM Cells Bank #1 Cell Array Bank #2 Cell Array 20

21 2-Bank SRAM-Based BRAM Cells WL Port B Vdd WL Port B BL Port B BL Port A WL Port A SRAM Cell WL Port A BL Port B BL Port A Bank #1 Cell Array Bank #2 Cell Array 21

22 2-Bank SRAM-Based BRAM Cells Traditional Decoders Column Decoders Bank #1 Cell Array Row Decoders and Wordline Drivers Bank #2 Cell Array 22

23 2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Write Drivers and Sense Amplifiers Write Drivers and Sense Amplifiers Precharge and Equalizer Column Decoders Precharge and Equalizer Bank #1 Cell Array Row Decoders and Wordline Drivers Bank #2 Cell Array 23

24 2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Vdd Vdd Write Drivers and Sense Amplifiers Write Drivers and Sense Amplifiers Precharge and Equalizer Bank #1 Cell Array Column Decoders Row Decoders and Wordline Drivers Precharge and Equalizer Bank #2 Cell Array BL BL Precharge and equalizer 24

25 2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Vdd Data Vdd Write Drivers and Sense Amplifiers Write Drivers and Sense Amplifiers Precharge and Equalizer Column Decoders Precharge and Equalizer Bank #1 Cell Array Row Decoders and Wordline Drivers Bank #2 Cell Array Write Driver 25

26 2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Write Drivers and Sense Amplifiers Precharge and Equalizer Column Decoders Write Drivers and Sense Amplifiers Precharge and Equalizer Sense Amplifier Bank #1 Cell Array Row Decoders and Wordline Drivers Bank #2 Cell Array 26

27 2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Width- Configurability Circuitry Write Drivers and Sense Amplifiers Width- Configurable Decoders Write Drivers and Sense Amplifiers Input Crossbar and Level Shifter Precharge and Equalizer Column Decoders Precharge and Equalizer Bank #1 Cell Array Row Decoders and Wordline Drivers Bank #2 Cell Array 27

28 2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Width- Configurability Circuitry Write Drivers and Sense Amplifiers Width- Configurable Decoders Write Drivers and Sense Amplifiers To/From Routing Input Crossbar and Level Shifter Precharge and Equalizer Column Decoders Precharge and Equalizer Bank #1 Cell Array Bank #2 Cell Array 28

29 2-Bank SRAM-Based BRAM Cells Traditional Decoders Read/Write Circuitry Width- Configurability Circuitry Write Drivers and Sense Amplifiers Width- Configurable Decoders Write Drivers and Sense Amplifiers To/From Routing Pipeline Registers Input Crossbar and Level Shifter Precharge and Equalizer Column Decoders Precharge and Equalizer Bank #1 Cell Array Row Decoders and Wordline Drivers Bank #2 Cell Array 29

30 2-Bank MTJ-Based BRAM Cells MTJbased Cell Bank #1 Cell Array Bank #2 Cell Array 30

31 2-Bank MTJ-Based BRAM Cells Reference Cells Reference Cells Bank #1 Cell Array Bank #2 Cell Array 31

32 2-Bank MTJ-Based BRAM Cells Traditional Decoders Read/Write Circuitry Sense Amplifier Sense Amplifier Write Driver Write Driver Write Driver Write Driver Column Selector and Predischarger Column Decoders Column Selector and Predischarger Reference Cells Bank #1 Cell Array Row Decoders and Wordline Drivers Reference Cells Bank #2 Cell Array 32

33 2-Bank MTJ-Based BRAM Cells Traditional Decoders Read/Write Circuitry Sense Amplifier Sense Amplifier Write Driver Write Driver Write Driver Write Driver Column Selector and Predischarger Column Decoders Column Selector and Predischarger Reference Cells Bank #1 Cell Array Row Decoders and Wordline Drivers Reference Cells Bank #2 Cell Array 33

34 2-Bank MTJ-Based BRAM Cells Traditional Decoders Read/Write Circuitry Sense Amplifier Sense Amplifier Write Driver Write Driver Write Driver Write Driver Column Selector and Predischarger Column Decoders Column Selector and Predischarger Reference Cells Bank #1 Cell Array Row Decoders and Wordline Drivers Reference Cells Bank #2 Cell Array Write Driver 34

35 2-Bank MTJ-Based BRAM Cells Traditional Decoders Read/Write Circuitry Sense Amplifier Sense Amplifier Write Driver Write Driver Write Driver Write Driver Column Selector and Predischarger Column Decoders Column Selector and Predischarger Reference Cells Bank #1 Cell Array Row Decoders and Wordline Drivers Reference Cells Bank #2 Cell Array 35

36 2-Bank MTJ-Based BRAM Cells Traditional Decoders Read/Write Circuitry Width- Configurability Circuitry Input Crossbar and Level Shifter Sense Amplifier Write Driver Write Driver Column Selector and Predischarger Width-Configurable Decoders Column Decoders Write Driver Sense Amplifier Write Driver Column Selector and Predischarger Reference Cells Bank #1 Cell Array Row Decoders and Wordline Drivers Reference Cells Bank #2 Cell Array 36

37 2-Bank MTJ-Based BRAM Cells Traditional Decoders Read/Write Circuitry Width- Configurability Circuitry To/From Routing Pipeline Registers Input Crossbar and Level Shifter Sense Amplifier Write Driver Write Driver Column Selector and Predischarger Reference Cells Bank #1 Cell Array Width-Configurable Decoders Column Decoders Row Decoders and Wordline Drivers Write Driver Sense Amplifier Write Driver Column Selector and Predischarger Reference Cells Bank #2 Cell Array 37

38 Simulation In Context: SRAM Precharge Sense Amp Write Driver Vdd Vdd Vdd The Rest of SRAM Cells Go here Vdd 38

39 Simulation In Context: SRAM Precharge Sense Amp Write Driver Vdd Vdd Initial Voltage = Vdd Initial Voltage = 0 Vdd The Rest of SRAM Cells Go here Vdd 39

40 Simulation In Context: SRAM Precharge Sense Amp Write Driver Vdd Vdd Initial Voltage = Vdd Initial Voltage = 0 Vdd The Rest of SRAM Cells Go here Vdd Target Voltage = 0.99 Vdd 40

41 Simulation In Context: Worst-case SRAM Cell Variation changes SRAM cell read/write currents significantly 41

42 Simulation In Context: Worst-case SRAM Cell Using the nominal memory cell will be inaccurate BRAM energy will be underestimated BRAM frequency will be overestimated 42

43 Simulation In Context: Worst-case SRAM Cell Don t have the Spice model for the worst-case cell! 43

44 Simulation In Context: Worst-case SRAM Cell Use Monte Carlo simulation to find distribution of cell properties 44

45 Monte Carlo Simulation: Worst-case Cell Read speed Cumulative probability 4x % 0 Average Cell worst Worst Cell I 10 I read [ua] 20 read K. Tatsumura et al."high Density, Low Energy, Magnetic Tunnel Junction Based Block RAMs for -Rich FPGAs," FPT

46 Simulation In Context: Worst-Case Cell Write Driver Sense Amplifier Precharge The Rest of SRAM Cells Go here 46

47 Simulation In Context: Worst-Case Cell Write Driver Sense Amplifier Precharge Initial Voltage = Vdd Initial Voltage = Vdd The Rest of SRAM Cells Go here 47

48 Simulation In Context: Worst-Case Cell Write Driver Sense Amplifier Precharge Initial Voltage = Vdd Initial Voltage = Vdd The Rest of SRAM Cells Go here Weakest Read Current 48

49 Simulation In Context: Worst-Case Cell Target Voltage Difference Write Driver Sense Amplifier Precharge Initial Voltage = Vdd Initial Voltage = Vdd The Rest of SRAM Cells Go here Weakest Read Current 49

50 Validation and Results 50

51 Area Validation Area results align well with commercial data 51

52 SRAM-based BRAM Area Breakdown SRAM area dominates for large BRAMs Smaller BRAMs: other components relevant 52

53 MTJ is more area-efficient It gets increasingly more efficient with BRAM size Area: MTJ vs. SRAM ~3.1x ~1.6x 53

54 Reasonable alignment with commercial data Less guardband No aggressive banking Frequency validation 54

55 Operating Frequency: MTJ vs. SRAM SRAM is faster The gap narrows with increasing size ~3.6x ~1.6x 55

56 Simulation Results: Energy 56

57 SRAM-based BRAM Narrow Mode FPGA RAM configurable Often used in narrower modes Energy mostly unaffected 57

58 Energy Per Bit: MTJ vs. SRAM MTJs are more efficient with large BRAM MTJ narrow mode more efficient ~1.7x ~3.2x 58

59 Worst-Case Cell Modeling Crucial Use nominal memory cell? Underestimates area and energy Gets more severe with increasing memory size BRAM Capacity (Kbit) Change in Delay 8-21% -9% 16-19% -6% 32-27% -15% 64-22% -9% % -20% % -29% Change in Energy per bit 59

60 Architecture Exploration 60

61 Architecture Exploration: RAM-Mapping Flow BRAM models generated by COFFE can be used in architecture exploration Area-oriented RAM mapping 69 industrial circuits Used in development of Stratix V memory architecture We have partial data: Number of logic blocks used Number, Sizes, and types of Logical RAMs Gradually excluding less-memory-rich circuits 61

62 16K always the best Stratix V-like SRAM-based BRAM richness 62

63 MTJ always saves area The best architecture changes 32k or 64k best MTJ-based BRAM ~26% richness 63

64 Architecture Exploration: VTR Flow Nine VTR benchmark circuits with memory MTJ vs. SRAM Architecture Parameters 32kb BRAM, every 8 columns MTJ BRAM is smaller à get more 2.3x more RAM blocks per column Ten 6-luts per logic block 64

65 Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 65

66 Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 66

67 Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 67

68 Architecture Exploration: VTR q Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 68

69 Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 69

70 Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 70

71 Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 71

72 Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 72

73 Architecture Exploration: VTR Changes by switching to MTJ-based BRAMs: Circuit RAM/LUT Ratio Block Area Routing Area Total Area Block Delay Routing Delay Total Delay mcml 1% -11% 0-5% 5% -19% -6% -10% LU32PEEng 7% -14% 9% -2% 9% -6% 0-1% LU8PEEng 6% -13% 4% -5% -4% 9% 3% -2% ch_intrinsics 2% -15% -2% -10% -1% 6% 3% -7% mkdelayworker32b 24% -32% -47% -41% 60% -40% 3% -39% mkpktmerge 198% -57% -48% -53% 141% -34% 12% -47% mksmadapter4b 8% -16% 0-10% 92% -32% 9% -2% or1200 2% -5% -4% 0 1% -7% -2% -3% raygentop 1% -3% 2% -1% 9% -17% -4% -5% boundtop 1% -3% 1% -1% 8% -5% 0-2% Geometric Mean 4% -19% -11% -15% 25% -16% 2% -14% Area-delay Product 73

74 Conclusion First transistor sizing tool capable of BRAM modeling SRAM-based MTJ-based Simulation results align well with available commercial data COFFE now enables BRAM architecture exploration! RAM-Mapping VTR 74

Module 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals

Module 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals Module 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals Objectives In this lecture you will learn the following Introduction SRAM and its Peripherals DRAM and its Peripherals 30.1 Introduction

More information

EEM 486: Computer Architecture. Lecture 9. Memory

EEM 486: Computer Architecture. Lecture 9. Memory EEM 486: Computer Architecture Lecture 9 Memory The Big Picture Designing a Multiple Clock Cycle Datapath Processor Control Memory Input Datapath Output The following slides belong to Prof. Onur Mutlu

More information

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Couture: Tailoring STT-MRAM for Persistent Main Memory Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung Executive Summary Motivation: DRAM plays an instrumental role in modern

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM

CMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 22: Memery, ROM [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L22 S.1

More information

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Z-RAM Ultra-Dense Memory for 90nm and Below Hot Chips 2006 David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Outline Device Overview Operation Architecture Features Challenges Z-RAM Performance

More information

The Memory Hierarchy 1

The Memory Hierarchy 1 The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow

More information

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly)

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) Memories and SRAM 1 Silicon Memories Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) The main goal is to be cheap Dense -- The smaller the bits, the less area you need,

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

CS250 VLSI Systems Design Lecture 9: Memory

CS250 VLSI Systems Design Lecture 9: Memory CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled

More information

2. Link and Memory Architectures and Technologies

2. Link and Memory Architectures and Technologies 2. Link and Memory Architectures and Technologies 2.1 Links, Thruput/Buffering, Multi-Access Ovrhds 2.2 Memories: On-chip / Off-chip SRAM, DRAM 2.A Appendix: Elastic Buffers for Cross-Clock Commun. Manolis

More information

Architectural Aspects in Design and Analysis of SOTbased

Architectural Aspects in Design and Analysis of SOTbased Architectural Aspects in Design and Analysis of SOTbased Memories Rajendra Bishnoi, Mojtaba Ebrahimi, Fabian Oboril & Mehdi Tahoori INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE NANO COMPUTING

More information

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly

More information

ECE 485/585 Microprocessor System Design

ECE 485/585 Microprocessor System Design Microprocessor System Design Lecture 5: Zeshan Chishti DRAM Basics DRAM Evolution SDRAM-based Memory Systems Electrical and Computer Engineering Dept. Maseeh College of Engineering and Computer Science

More information

MTJ-Based Nonvolatile Logic-in-Memory Architecture

MTJ-Based Nonvolatile Logic-in-Memory Architecture 2011 Spintronics Workshop on LSI @ Kyoto, Japan, June 13, 2011 MTJ-Based Nonvolatile Logic-in-Memory Architecture Takahiro Hanyu Center for Spintronics Integrated Systems, Tohoku University, JAPAN Laboratory

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University

More information

Versatile RRAM Technology and Applications

Versatile RRAM Technology and Applications Versatile RRAM Technology and Applications Hagop Nazarian Co-Founder and VP of Engineering, Crossbar Inc. Santa Clara, CA 1 Agenda Overview of RRAM Technology RRAM for Embedded Memory Mass Storage Memory

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

! Memory.  RAM Memory.  Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell.  Used in most commercial chips ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 5, 8 Memory: Periphery circuits Today! Memory " RAM Memory " Architecture " Memory core " SRAM " DRAM " Periphery " Serial Access Memories

More information

Semiconductor Memory Classification

Semiconductor Memory Classification ESE37: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 6: November, 7 Memory Overview Today! Memory " Classification " Architecture " Memory core " Periphery (time permitting)!

More information

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) The main goal is to be cheap

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) The main goal is to be cheap Memories and SRAM 1 Silicon Memories Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) The main goal is to be cheap Dense -- The smaller the bits, the less area you need,

More information

COMPUTER ARCHITECTURES

COMPUTER ARCHITECTURES COMPUTER ARCHITECTURES Random Access Memory Technologies Gábor Horváth BUTE Department of Networked Systems and Services ghorvath@hit.bme.hu Budapest, 2019. 02. 24. Department of Networked Systems and

More information

The Speed of Diversity: Exploring Complex FPGA Routing Topologies for the Global Metal Layer

The Speed of Diversity: Exploring Complex FPGA Routing Topologies for the Global Metal Layer The Speed of Diversity: Exploring Complex FPGA Routing Topologies for the Global Metal Layer Oleg Petelin and Vaughn Betz Department of Electrical and Computer Engineering University of Toronto, Toronto,

More information

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness EE 352 Unit 10 Memory System Overview SRAM vs. DRAM DMA & Endian-ness The Memory Wall Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology

More information

Lecture-14 (Memory Hierarchy) CS422-Spring

Lecture-14 (Memory Hierarchy) CS422-Spring Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect

More information

Emerging NVM Memory Technologies

Emerging NVM Memory Technologies Emerging NVM Memory Technologies Yuan Xie Associate Professor The Pennsylvania State University Department of Computer Science & Engineering www.cse.psu.edu/~yuanxie yuanxie@cse.psu.edu Position Statement

More information

Test and Reliability of Emerging Non-Volatile Memories

Test and Reliability of Emerging Non-Volatile Memories Test and Reliability of Emerging Non-Volatile Memories Elena Ioana Vătăjelu, Lorena Anghel TIMA Laboratory, Grenoble, France Outline Emerging Non-Volatile Memories Defects and Fault Models Test Algorithms

More information

Field Programmable Gate Array (FPGA) Devices

Field Programmable Gate Array (FPGA) Devices Field Programmable Gate Array (FPGA) Devices 1 Contents Altera FPGAs and CPLDs CPLDs FPGAs with embedded processors ACEX FPGAs Cyclone I,II FPGAs APEX FPGAs Stratix FPGAs Stratix II,III FPGAs Xilinx FPGAs

More information

ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems

ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 26: November 9, 2018 Memory Overview Dynamic OR4! Precharge time?! Driving input " With R 0 /2 inverter! Driving inverter

More information

Advanced 1 Transistor DRAM Cells

Advanced 1 Transistor DRAM Cells Trench DRAM Cell Bitline Wordline n+ - Si SiO 2 Polysilicon p-si Depletion Zone Inversion at SiO 2 /Si Interface [IC1] Address Transistor Memory Capacitor SoC - Memory - 18 Advanced 1 Transistor DRAM Cells

More information

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy. ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 4, 7 Memory Overview, Memory Core Cells Today! Memory " Classification " ROM Memories " RAM Memory " Architecture " Memory core " SRAM

More information

Understanding Reduced-Voltage Operation in Modern DRAM Devices

Understanding Reduced-Voltage Operation in Modern DRAM Devices Understanding Reduced-Voltage Operation in Modern DRAM Devices Experimental Characterization, Analysis, and Mechanisms Kevin Chang A. Giray Yaglikci, Saugata Ghose,Aditya Agrawal *, Niladrish Chatterjee

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information

2000 N + N <100N. When is: Find m to minimize: (N) m. N log 2 C 1. m + C 3 + C 2. ESE534: Computer Organization. Previously. Today.

2000 N + N <100N. When is: Find m to minimize: (N) m. N log 2 C 1. m + C 3 + C 2. ESE534: Computer Organization. Previously. Today. ESE534: Computer Organization Previously Day 7: February 6, 2012 Memories Arithmetic: addition, subtraction Reuse: pipelining bit-serial (vectorization) Area/Time Tradeoffs Latency and Throughput 1 2 Today

More information

Deep Sub-Micron Cache Design

Deep Sub-Micron Cache Design Cache Design Challenges in Deep Sub-Micron Process Technologies L2 COE Carl Dietz May 25, 2007 Deep Sub-Micron Cache Design Agenda Bitcell Design Array Design SOI Considerations Surviving in the corporate

More information

Prototype of SRAM by Sergey Kononov, et al.

Prototype of SRAM by Sergey Kononov, et al. Prototype of SRAM by Sergey Kononov, et al. 1. Project Overview The goal of the project is to create a SRAM memory layout that provides maximum utilization of the space on the 1.5 by 1.5 mm chip. Significant

More information

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas Power Solutions for Leading-Edge FPGAs Vaughn Betz & Paul Ekas Agenda 90 nm Power Overview Stratix II : Power Optimization Without Sacrificing Performance Technical Features & Competitive Results Dynamic

More information

The DRAM Cell. EEC 581 Computer Architecture. Memory Hierarchy Design (III) 1T1C DRAM cell

The DRAM Cell. EEC 581 Computer Architecture. Memory Hierarchy Design (III) 1T1C DRAM cell EEC 581 Computer Architecture Memory Hierarchy Design (III) Department of Electrical Engineering and Computer Science Cleveland State University The DRAM Cell Word Line (Control) Bit Line (Information)

More information

ECE 485/585 Microprocessor System Design

ECE 485/585 Microprocessor System Design Microprocessor System Design Lecture 4: Memory Hierarchy Memory Taxonomy SRAM Basics Memory Organization DRAM Basics Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering

More information

Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall

Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall The Memory Wall EE 357 Unit 13 Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology targets density rather than speed) Large memories

More information

Integrated Circuits & Systems

Integrated Circuits & Systems Federal University of Santa Catarina Center for Technology Computer Science & Electronics Engineering Integrated Circuits & Systems INE 5442 Lecture 23-1 guntzel@inf.ufsc.br Semiconductor Memory Classification

More information

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary

More information

Topic 21: Memory Technology

Topic 21: Memory Technology Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,

More information

Topic 21: Memory Technology

Topic 21: Memory Technology Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,

More information

Memories: Memory Technology

Memories: Memory Technology Memories: Memory Technology Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Memory Hierarchy

More information

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache

More information

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand Main Memory & DRAM Nima Honarmand Main Memory Big Picture 1) Last-level cache sends its memory requests to a Memory Controller Over a system bus of other types of interconnect 2) Memory controller translates

More information

EMDBAM: A Low-Power Dual Bit Associative Memory with Match Error and Mask Control

EMDBAM: A Low-Power Dual Bit Associative Memory with Match Error and Mask Control Abstract: EMDBAM: A Low-Power Dual Bit Associative Memory with Match Error and Mask Control A ternary content addressable memory (TCAM) speeds up the search process in the memory by searching through prestored

More information

EECS150 - Digital Design Lecture 16 Memory 1

EECS150 - Digital Design Lecture 16 Memory 1 EECS150 - Digital Design Lecture 16 Memory 1 March 13, 2003 John Wawrzynek Spring 2003 EECS150 - Lec16-mem1 Page 1 Memory Basics Uses: Whenever a large collection of state elements is required. data &

More information

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE. Memory Design I Professor Chris H. Kim University of Minnesota Dept. of ECE chriskim@ece.umn.edu Array-Structured Memory Architecture 2 1 Semiconductor Memory Classification Read-Write Wi Memory Non-Volatile

More information

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling

More information

Loadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array

Loadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array Loadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array Wujie Wen, Yaojun Zhang, Lu Zhang and Yiran Chen University of Pittsburgh Loadsa: a slang language means lots of Outline Introduction

More information

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed. Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports

More information

+1 (479)

+1 (479) Memory Courtesy of Dr. Daehyun Lim@WSU, Dr. Harris@HMC, Dr. Shmuel Wimer@BIU and Dr. Choi@PSU http://csce.uark.edu +1 (479) 575-6043 yrpeng@uark.edu Memory Arrays Memory Arrays Random Access Memory Serial

More information

CIRCUIT AND ARCHITECTURE CO-DESIGN OF STT-RAM FOR HIGH PERFORMANCE AND LOW ENERGY. by Xiuyuan Bi Master of Science, New York University, 2010

CIRCUIT AND ARCHITECTURE CO-DESIGN OF STT-RAM FOR HIGH PERFORMANCE AND LOW ENERGY. by Xiuyuan Bi Master of Science, New York University, 2010 CIRCUIT AND ARCHITECTURE CO-DESIGN OF STT-RAM FOR HIGH PERFORMANCE AND LOW ENERGY by Xiuyuan Bi Master of Science, New York University, 2010 Submitted to the Graduate Faculty of the Swanson School of Engineering

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information

Research Challenges for FPGAs

Research Challenges for FPGAs Research Challenges for FPGAs Vaughn Betz CAD Scalability Recent FPGA Capacity Growth Logic Eleme ents (Thousands) 400 350 300 250 200 150 100 50 0 MCNC Benchmarks 250 nm FLEX 10KE Logic: 34X Memory Bits:

More information

Magnetoresistive RAM (MRAM) Jacob Lauzon, Ryan McLaughlin

Magnetoresistive RAM (MRAM) Jacob Lauzon, Ryan McLaughlin Magnetoresistive RAM (MRAM) Jacob Lauzon, Ryan McLaughlin Agenda Current solutions Why MRAM? What is MRAM? History How it works Comparisons Outlook Current Memory Types Memory Market primarily consists

More information

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 18-447: Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 Reminder: Homework 5 (Today) Due April 3 (Wednesday!) Topics: Vector processing,

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 8 Dr. Ahmed H. Madian ah_madian@hotmail.com Content Array Subsystems Introduction General memory array architecture SRAM (6-T cell) CAM Read only memory Introduction

More information

PICo Embedded High Speed Cache Design Project

PICo Embedded High Speed Cache Design Project PICo Embedded High Speed Cache Design Project TEAM LosTohmalesCalientes Chuhong Duan ECE 4332 Fall 2012 University of Virginia cd8dz@virginia.edu Andrew Tyler ECE 4332 Fall 2012 University of Virginia

More information

CS311 Lecture 21: SRAM/DRAM/FLASH

CS311 Lecture 21: SRAM/DRAM/FLASH S 14 L21-1 2014 CS311 Lecture 21: SRAM/DRAM/FLASH DARM part based on ISCA 2002 tutorial DRAM: Architectures, Interfaces, and Systems by Bruce Jacob and David Wang Jangwoo Kim (POSTECH) Thomas Wenisch (University

More information

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance

More information

Design Methodologies. Full-Custom Design

Design Methodologies. Full-Custom Design Design Methodologies Design styles Full-custom design Standard-cell design Programmable logic Gate arrays and field-programmable gate arrays (FPGAs) Sea of gates System-on-a-chip (embedded cores) Design

More information

Spiral 2-9. Tri-State Gates Memories DMA

Spiral 2-9. Tri-State Gates Memories DMA 2-9.1 Spiral 2-9 Tri-State Gates Memories DMA 2-9.2 Learning Outcomes I understand how a tri-state works and the rules for using them to share a bus I understand how SRAM and DRAM cells perform reads and

More information

Learning Outcomes. Spiral 2-9. Typical Logic Gate TRI-STATE GATES

Learning Outcomes. Spiral 2-9. Typical Logic Gate TRI-STATE GATES 2-9.1 Learning Outcomes 2-9.2 Spiral 2-9 Tri-State Gates Memories DMA I understand how a tri-state works and the rules for using them to share a bus I understand how SRAM and DRAM cells perform reads and

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

The Engine. SRAM & DRAM Endurance and Speed with STT MRAM. Les Crudele / Andrew J. Walker PhD. Santa Clara, CA August

The Engine. SRAM & DRAM Endurance and Speed with STT MRAM. Les Crudele / Andrew J. Walker PhD. Santa Clara, CA August The Engine & DRAM Endurance and Speed with STT MRAM Les Crudele / Andrew J. Walker PhD August 2018 1 Contents The Leaking Creaking Pyramid STT-MRAM: A Compelling Replacement STT-MRAM: A Unique Endurance

More information

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I Overview Anti-fuse and EEPROM-based devices Contemporary SRAM devices - Wiring - Embedded New trends - Single-driver wiring -

More information

Christophe HURIAUX. Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration

Christophe HURIAUX. Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration Mid-term Evaluation March 19 th, 2015 Christophe HURIAUX Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration Accélérateurs matériels reconfigurables embarqués avec reconfiguration

More information

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS

More information

AC-DIMM: Associative Computing with STT-MRAM

AC-DIMM: Associative Computing with STT-MRAM AC-DIMM: Associative Computing with STT-MRAM Qing Guo, Xiaochen Guo, Ravi Patel Engin Ipek, Eby G. Friedman University of Rochester Published In: ISCA-2013 Motivation Prevalent Trends in Modern Computing:

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Lecture 11: MOS Memory

Lecture 11: MOS Memory Lecture 11: MOS Memory MAH, AEN EE271 Lecture 11 1 Memory Reading W&E 8.3.1-8.3.2 - Memory Design Introduction Memories are one of the most useful VLSI building blocks. One reason for their utility is

More information

Chapter 8 Memory Basics

Chapter 8 Memory Basics Logic and Computer Design Fundamentals Chapter 8 Memory Basics Charles Kime & Thomas Kaminski 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Overview Memory definitions Random Access

More information

An introduction to SDRAM and memory controllers. 5kk73

An introduction to SDRAM and memory controllers. 5kk73 An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part

More information

arxiv: v1 [cs.ar] 13 Jul 2018

arxiv: v1 [cs.ar] 13 Jul 2018 What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study arxiv:1807.05102v1 [cs.ar] 13 Jul 2018 SAUGATA GHOSE, Carnegie Mellon University ABDULLAH GIRAY YAĞLIKÇI, ETH

More information

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES. psa. rom. fpga THE WAY THE MODULES ARE PROGRAMMED NETWORKS OF PROGRAMMABLE MODULES EXAMPLES OF USES Programmable

More information

Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization

Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization Karthik Chandrasekar, Sven Goossens 2, Christian Weis 3, Martijn Koedam 2, Benny Akesson 4, Norbert Wehn 3, and Kees

More information

MNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System

MNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System MNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System Lixue Xia 1, Boxun Li 1, Tianqi Tang 1, Peng Gu 12, Xiling Yin 1, Wenqin Huangfu 1, Pai-Yu Chen 3, Shimeng Yu 3, Yu Cao 3,

More information

ECE 2300 Digital Logic & Computer Organization

ECE 2300 Digital Logic & Computer Organization ECE 2300 Digital Logic & Computer Organization Spring 201 Memories Lecture 14: 1 Announcements HW6 will be posted tonight Lab 4b next week: Debug your design before the in-lab exercise Lecture 14: 2 Review:

More information

Column decoder using PTL for memory

Column decoder using PTL for memory IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 5, Issue 4 (Mar. - Apr. 2013), PP 07-14 Column decoder using PTL for memory M.Manimaraboopathy

More information

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

! Memory Overview. ! ROM Memories. ! RAM Memory  SRAM  DRAM. ! This is done because we can build.  large, slow memories OR ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 2: April 5, 26 Memory Overview, Memory Core Cells Lecture Outline! Memory Overview! ROM Memories! RAM Memory " SRAM " DRAM 2 Memory Overview

More information

CENG 4480 L09 Memory 2

CENG 4480 L09 Memory 2 CENG 4480 L09 Memory 2 Bei Yu Reference: Chapter 11 Memories CMOS VLSI Design A Circuits and Systems Perspective by H.E.Weste and D.M.Harris 1 v.s. CENG3420 CENG3420: architecture perspective memory coherent

More information

EECS150 - Digital Design Lecture 16 - Memory

EECS150 - Digital Design Lecture 16 - Memory EECS150 - Digital Design Lecture 16 - Memory October 17, 2002 John Wawrzynek Fall 2002 EECS150 - Lec16-mem1 Page 1 Memory Basics Uses: data & program storage general purpose registers buffering table lookups

More information

Stratix II vs. Virtex-4 Power Comparison & Estimation Accuracy

Stratix II vs. Virtex-4 Power Comparison & Estimation Accuracy White Paper Stratix II vs. Virtex-4 Power Comparison & Estimation Accuracy Introduction This document compares power consumption and power estimation accuracy for Altera Stratix II FPGAs and Xilinx Virtex-4

More information

CENG 4480 L09 Memory 3

CENG 4480 L09 Memory 3 CENG 4480 L09 Memory 3 Bei Yu Chapter 11 Memories Reference: CMOS VLSI Design A Circuits and Systems Perspective by H.E.Weste and D.M.Harris 1 Memory Arrays Memory Arrays Random Access Memory Serial Access

More information

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB) CS6290 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesn t want to be bothered Do you think, oh, this computer only has 128MB so

More information

The Processor That Don't Cost a Thing

The Processor That Don't Cost a Thing The Processor That Don't Cost a Thing Peter Hsu, Ph.D. Peter Hsu Consulting, Inc. http://cs.wisc.edu/~peterhsu DRAM+Processor Commercial demand Heat stiffling industry's growth Heat density limits small

More information

Developing a Prototyping Board for Emerging Memory

Developing a Prototyping Board for Emerging Memory Developing a Prototyping Board for Emerging Memory 2013. 10. 25 Sungjoo Yoo Embedded System Architecture Lab. POSTECH Introduction scaling problem [ITRS, 2012] Year 2012 2013 2014 2015 2016 2017 2018 2019

More information

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns March 12, 2018 Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao, Youtao Zhang, Jun Yang Executive Summary Problems: performance and reliability of write operations

More information

Calibrating Achievable Design GSRC Annual Review June 9, 2002

Calibrating Achievable Design GSRC Annual Review June 9, 2002 Calibrating Achievable Design GSRC Annual Review June 9, 2002 Wayne Dai, Andrew Kahng, Tsu-Jae King, Wojciech Maly,, Igor Markov, Herman Schmit, Dennis Sylvester DUSD(Labs) Calibrating Achievable Design

More information

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design

More information

A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS

A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS ABSTRACT We describe L1 cache designed for digital signal processor (DSP) core. The cache is 32KB with variable associativity (4 to 16 ways) and is pseudo-dual-ported.

More information

Improved Initial Overdrive Sense-Amplifier. For Low-Voltage DRAMS. Analog CMOS IC Design. Esayas Naizghi April 30, 2004

Improved Initial Overdrive Sense-Amplifier. For Low-Voltage DRAMS. Analog CMOS IC Design. Esayas Naizghi April 30, 2004 Analog CMOS IC Design Improved Initial Overdrive Sense-Amplifier For Low-Voltage DRAMS Esayas Naizghi April 30, 2004 Overview 1. Introduction 2. Goals and Objectives 3. Gate Sizing Theory 4. DRAM Introduction

More information

Introduction to SRAM. Jasur Hanbaba

Introduction to SRAM. Jasur Hanbaba Introduction to SRAM Jasur Hanbaba Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Non-volatile Memory Manufacturing Flow Memory Arrays Memory Arrays Random Access Memory Serial

More information

Memory and Programmable Logic

Memory and Programmable Logic Digital Circuit Design and Language Memory and Programmable Logic Chang, Ik Joon Kyunghee University Memory Classification based on functionality ROM : Read-Only Memory RWM : Read-Write Memory RWM NVRWM

More information

Couture: Tailoring STT-MRAM for Persistent Main Memory

Couture: Tailoring STT-MRAM for Persistent Main Memory : Tailoring for Persistent Main Memory Mustafa M Shihab 1, Jie Zhang 3, Shuwen Gao 2, Joseph Callenes-Sloan 1, and Myoungsoo Jung 3 1 University of Texas at Dallas, 2 Intel Corporation 3 School of Integrated

More information