MTJ-Based Nonvolatile Logic-in-Memory Architecture

Similar documents
Standby-Power-Free Integrated Circuits Using MTJ-Based VLSI Computing for IoT Applications

CMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM

EMDBAM: A Low-Power Dual Bit Associative Memory with Match Error and Mask Control

Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.

Magnetic core memory (1951) cm 2 ( bit)

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

Integrated Circuits & Systems

Chapter 3 Semiconductor Memories. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

CMP annual meeting, January 23 rd, 2014

A Low Power 1Mbit MRAM based based on 1T1MTJ Bit Cell Integrated with Copper Interconnects

Preventing Design Reverse Engineering with Reconfigurable Spin Transfer Torque LUT Gates

Memory Design I. Semiconductor Memory Classification. Read-Write Memories (RWM) Memory Scaling Trend. Memory Scaling Trend

CMOS Logic Circuit Design Link( リンク ): センター教官講義ノートの下 CMOS 論理回路設計

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

Analysis of 8T SRAM with Read and Write Assist Schemes (UDVS) In 45nm CMOS Technology

Recent Advancements in Spin-Torque Switching for High-Density MRAM

CENG 4480 L09 Memory 2

Design and Implementation of High Performance Application Specific Memory

Design and Implementation of Low Power LUT Based on Nonvolatile RRAM

International Journal of Advance Engineering and Research Development LOW POWER AND HIGH PERFORMANCE MSML DESIGN FOR CAM USE OF MODIFIED XNOR CELL

Design of Low Power Wide Gates used in Register File and Tag Comparator

ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems

An Architecture-level Cache Simulation Framework Supporting Advanced PMA STT-MRAM

EECS 427 Lecture 17: Memory Reliability and Power Readings: 12.4,12.5. EECS 427 F09 Lecture Reminders

Versatile RRAM Technology and Applications

STUDY OF SRAM AND ITS LOW POWER TECHNIQUES

A Write-Back-Free 2T1D Embedded. a Dual-Row-Access Low Power Mode.

Semiconductor Memory Classification

Spiral 2-8. Cell Layout

A Non-Volatile Microcontroller with Integrated Floating-Gate Transistors

The Engine. SRAM & DRAM Endurance and Speed with STT MRAM. Les Crudele / Andrew J. Walker PhD. Santa Clara, CA August

Design Method of Stacked Type MRAM. with NAND Structured Cell

ΔΙΑΛΕΞΗ 5: FPGA Programming Technologies (aka: how to connect/disconnect wires/gates)

+1 (479)

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

Implementation of Efficient Ternary Content Addressable Memory by Using Butterfly Technique

Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology

Sense Amplifiers 6 T Cell. M PC is the precharge transistor whose purpose is to force the latch to operate at the unstable point.

! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

Digital Integrated Circuits A Design Perspective. Jan M. Rabaey

Lecture 20: CAMs, ROMs, PLAs

Hardware Design with VHDL PLDs I ECE 443. FPGAs can be configured at least once, many are reprogrammable.

Memory and Programmable Logic

Architectural Aspects in Design and Analysis of SOTbased

More Course Information

Unleashing MRAM as Persistent Memory

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007

Introduction to SRAM. Jasur Hanbaba

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

Embedded System Application

Memory Arrays. Array Architecture. Chapter 16 Memory Circuits and Chapter 12 Array Subsystems from CMOS VLSI Design by Weste and Harris, 4 th Edition

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

A Low Power SRAM Base on Novel Word-Line Decoding

Non-Volatile Memory and Normally-Off Computing

DESIGN ANDIMPLEMENTATION OF EFFICIENT TERNARY CONTENT ADDRESSABLE MEMORY

DESIGN OF HIGH SPEED & LOW POWER SRAM DECODER

FeRAM Circuit Technology for System on a Chip

Content Addressable Memory Using Automatic Charge Balancing with Self-Control Mechanism and Master-Slave Match Line Design

Lecture 11 SRAM Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

LOW POWER SRAM CELL WITH IMPROVED RESPONSE

Don t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration

Ternary Content Addressable Memory Types And Matchline Schemes

Filter-Based Dual-Voltage Architecture for Low-Power Long-Word TCAM Design

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

AC-DIMM: Associative Computing with STT-MRAM

Comparative Analysis of Low Leakage SRAM Cell at 32nm Technology

CS310 Embedded Computer Systems. Maeng

emram: From Technology to Applications David Eggleston VP Embedded Memory

Content Addressable Memory performance Analysis using NAND Structure FinFET

Memory in Digital Systems

Design Methodologies

Introduction to CMOS VLSI Design. Semiconductor Memory Harris and Weste, Chapter October 2018

Introduction to CMOS VLSI Design Lecture 13: SRAM

memories The world of Memories DEEP SUBMICRON CMOS DESIGN 10. Memories

DRAM with Boosted 3T Gain Cell, PVT-tracking Read Reference Bias

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM

Memory in Digital Systems

Novel Cell Array Noise Cancelling Design Scheme. for Stacked Type MRAM. with NAND Structured Cell

Area, Power, and Latency Considerations of STT-MRAM to Substitute for Main Memory

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering,

ELCT 912: Advanced Embedded Systems

Computing-in-Memory with Spintronics

Digital Integrated Circuits Lecture 13: SRAM

Memory in Digital Systems

A REVIEW ON INTEGRATION OF SPIN RAM IN FPGA CIRCUITS

Based on slides/material by. Topic 7-4. Memory and Array Circuits. Outline. Semiconductor Memory Classification

SRAM. Introduction. Digital IC

Programmable Logic Devices Introduction CMPE 415. Programmable Logic Devices

Unleashing the Power of Embedded DRAM

CHAPTER 12 ARRAY SUBSYSTEMS [ ] MANJARI S. KULKARNI

8.3.4 The Four-Transistor (4-T) Cell

SYSTEM SPECIFICATIONS USING VERILOG HDL. Dr. Mohammed M. Farag

Spin-Hall Effect MRAM Based Cache Memory: A Feasibility Study

Information Storage and Spintronics 10

Emerging NV Storage and Memory Technologies --Development, Manufacturing and

High Density, High Reliability Carbon Nanotube NRAM. Thomas Rueckes CTO Nantero

EXTREMELY LOW-POWER AI HARDWARE ENABLED BY CRYSTALLINE OXIDE SEMICONDUCTORS

Transcription:

2011 Spintronics Workshop on LSI @ Kyoto, Japan, June 13, 2011 MTJ-Based Nonvolatile Logic-in-Memory Architecture Takahiro Hanyu Center for Spintronics Integrated Systems, Tohoku University, JAPAN Laboratory for Brainware Systems, Research Institute of Electrical Communication, Tohoku University, JAPAN Acknowledgement: This work was supported by the project Research and Development of Ultra-Low Power Spintronics-Based VLSIs under the FIRST program of JSPS.

Outline Nonvolatile Logic-in-Memory Architecture Overview NV GP-Logic: Nonvolatile-FPGA NV SP-Logic: Nonvolatile-TCAM Conclusions & Future Prospects 2

Background: Increasing delay & power Leakage current Logic and Memory modules are separated Many interconnections between modules On-chip memory modules are volatile. Wire delay dominates chip performance Global wires requires large drivers. Power supply must be continuously applied in memory modules. Delay: Long Power: Large Static power: Large 3

Nonvolatile logic-in-memory architecture Logic-in-Memory Architecture (proposed in 1969): Storage elements are distributed over a logic-circuit plane. Magnetic Tunnel Junction (MTJ) device MTJ layer CMOS layer No volatility Unlimited endurance Fast writability Scalability CMOS compatibility 3-D stack capability Storage is nonvolatile: (Leakage current is cut off) MTJ devices are put on the CMOS layer Storage/logic are merged: (global-wire count is reduced) Static power is cut off. Chip area is reduced. Wire delay is reduced. Dynamic power is reduced. 4

Implementation of MTJ Device MTJ Device Fabricated CMOS back-end process M3 TH M2 TH M1 M1 LE TH M3 TH M2 TH UE M1 MTJ Layer Metal Layers MTJ MTJ C Gate C Gate C CMOS Layer 300nm Substrate Diffusion MTJ device stacked over MOS Plane D. Suzuki, et al., VLSI Circuit Symp. 2009 Cross-sectional SEM image The. area cost using MTJ device is small. 5

Power-Gating Suitability Power switch External Nonvolatile storage Power switch V DD Volatile storage Leakage current Logic Power switch Power Active Standby Active Escape to NVM Reload from NVM Time Power switch (PMOS) V DD Power Active Standby Active Nonvolatile storage (MTJ device) Logic Time NV logic-in-memory architecture Power gating is performed without data backup/reload. 6

Nonvolatile Processor Architecture Flash DRAM GP-Logic: General-purpose logic SP-Logic: Special-purpose logic Spin RAM NV NV NV SRAM FF GP-Logic FF SP-Logic Si 1st -step Nonvolatile Processor Spin RAM NV NVLIM NVLIM SRAM GP-Logic SP-Logic Si 2nd-step Nonvolatile Processor Nonvolatile Field-Programmable Gate Array (FPGA) Nonvolatile Ternary Content- Addressable Memory (TCAM) 7

Outline Nonvolatile Logic-in-Memory Architecture Overview NV GP-Logic: Nonvolatile-FPGA NV SP-Logic: Nonvolatile-TCAM Conclusions & Future Prospects 8

Nonvolatile Field-Programmable Gate Array (FPGA) NV LUT (Lookup Table) NV device -- Arbitrary logic functions are performed and programmed by FPGA -- Power dissipation and hardware overhead are two major issues. -- NV storage elements are distributed over the NV-FPGA (no external NVM). NVM Not required! NV FPGA Leakage current elimination and short latency are possible. How to design? Nonvolatile logic-inmemory architecture 9

Conventional nonvolatile FPGA CMOS logic circuit requires high-voltage input swing. MTJ MTJ MTJ MTJ Low voltage Combinational logic (CMOS) (: Sense Amplifier) High Voltage Output How do we perform logic operation by using low swing signal from MTJ device directly? 10

MOS/MTJ-hybrid circuitry (Proposed) Current-mode logic (CML) Logic operation is performed even low swing voltage by using the small difference of the current value. MTJ MTJ MTJ MTJ Combinational logic (Current-Mode) Output Low voltage High voltage Device count is reduced to 28% with less performance degradation. 11

Operation example (XOR) Z = 0 Sense Amplifier Z = 1 I F I F > I REF I REF Truth table 0 1 R AP 1 0 1 R P R P R AP 0 1 0 0 1 R REF A B Z 0 0 1 0 1 0 1 0 0 1 1 1 Logic operation in low swing voltage is performed by using a MOS/MTJ-hybrid network. 12

Test chip features Fabricated 2-input LUT D. Suzuki, et al., VLSI Circuit Symposium, June 2009. Selection Transistor Tree Process 0.14µm MTJ/MOS 1-Poly, 3-Metal 4 MTJ devices are stacked over MOS layer Write Area 287µm 2 MTJ Size 50nm 150nm TMR Ratio 100% Current 150µA Time Standby Current 10ns 0A 13

Measured waveforms (Basic operations) Input A Input B P E P E P E P E P: Pre-Charge E: Evaluate Output Z Output Z A 0.78V/div B Z 100µs/div Z 1 0 0 0 1 1 1 0 NOR NAND 0 1 1 0 1 0 0 1 XOR XNOR 14

Immediate wakeup behavior Active Standby Active CLK V DD Z A B V DD = 0 A B 00 01 10 11 00 01 10 11 Z 0.78V/div 50µs/div Immediate wakeup behavior has also measured successfully. 15

Comparison of performances Device Counts Area *2) Nonvolatile SRAM *1) 102 MOSs + 8 MTJs 702 µm 2 Proposed 29 MOSs + 4 MTJs 287µm 2 Active Delay *3) 140 ps 185 ps Power *3) 26.7 µw 17.5 µw Standby Power 0 µw 0 µw *1) W. Zhao, et al., Physica Status SOLIDI a Application and Materials Science, 205, 6, 1373/1377, May 2008. *2) Estimation based on a 0.14µm process *3) HSPICE simulation based on a 0.14µm MOS/MTJ-hybrid process 16

Outline Nonvolatile Logic-in-Memory Architecture Overview NV GP-Logic: Nonvolatile-FPGA NV SP-Logic: Nonvolatile-TCAM Conclusions & Future Prospects 17

Ternary Content-Addressable Memory (TCAM) Bit-line driver Input key 2 2 2 BL 1 BL 1 BL 2 BL 2 BL n BL n 0 1 0 0 1 1 0 Search-line / Word-line driver 2 2 2 2 2 2 2 1 1 0 0 0 1 X 0 1 0 0 1 X X Stored words 1 0 1 0 1 X X OUT 1 OUT 2 OUT n Fully parallel masked equality search Output driver 0 () 1 () 0 () Fully parallel search and fully parallel comparison can be done. TCAM is a functional memory. TCAM is the powerful data-search engine useful for various applications such as database machine and virus checker in network router TCAM must be implemented more compactly with lower power dissipation. 18

NV-TCAM Cell Function Search Stored data input Current result comparison B (b 1,b 2 ) S ML 0 (0,1) 1 (1,0) X don t care (0,0) 0 I Z < I Z 1 I Z > I Z 0 I Z > I Z 1 I Z < I Z 0 I Z < I Z 1 I Z < I Z 1 () 0 () 0 () 1 () 1 () 1 () 19

CMOS-based TCAM cell circuit ML V DD 1-bit storage Equality-detection (ED) circuit 1-bit storage WL V SS Leakage current Leakage current BL 1 SL SL BL 2 Transistor counts : 12 (ED;4T, 2-bit storage;8t) Input/output wires : 8 (BL;2, WL;1, V DD &V SS ;2, SL;2, ML;1) Always supply the power : Many leakage current path How to realize compact & cut off the leakage current? 20

MOS/MTJ-hybrid TCAM cell circuit S. Matsunaga, et al. Applied Physics Express (APEX), 2, 2, 023004, Feb. 2009. ML/BL 2-bit storage (MTJs) Logic (MTJs & MOSs) SL /WL 1 SL/WL 1 Merge storage into logic circuit : Compact (2T-2MTJ) Share wires : 4 (ML/BL, SL/WL, No-V DD ) 3-D stack structure : Great reduction of circuit area Compact & nonvolatile TCAM cell with MTJ devices 21

Power-Gating Scheme of Bit-Serial NV-TCAM 1st-bit search 1 1 1 Search word 2nd-bit search 1 1 1 Search word S. Matsunaga, et al., JJAP 49 (2010) 04DM05. 3rd-bit search 1 1 1 Search word 0 0 X 0 0 X 0 0 X 0 1 0 0 1 0 0 1 0 0 X 1 0 X 1 0 X 1 1 0 X 1 0 X 1 0 X 1 1 0 1 1 0 1 1 0 1 X 1 1 X 1 1 X 1 X 0 X X 0 X X 0 X X 1 0 X 1 0 X 1 0 X X 1 X X 1 X X 1 TCAM cell in active mode TCAM cell in standby mode (Static power is suppressed.) Accumulator in active mode Sense amplifier in active mode Sense amplifier in standby mode (Static power is suppressed.) According to the word length of the TCAM, the effectiveness of the standby-power reduction is increased. 22

TCAM cell circuit test chip 3.0 µm Chip features 9.8 µm Output generator in ML TCAM cell Ref. cell Dynamic current comparator in ML Process 0.14µm CMOS/MTJ 1-Poly, 3-Metal Total area 29.4 µm 2 TCAM cell size Cell structure MTJ size 3.15 µm 2 (2.1 µm 1.5 µm) a) 2MOSs-2MTJs 50 nm 200 nm TMR ratio 167 % Average write current Standby current 274 µa (τ p = 10 µs) b) 0A (Power off) a) A CMOS-based TCAM cell with 12 transistors, whose cell size is 17.54 µm 2 under a 0.18 µm CMOS process, has been reported. 8) The size of the conventional TCAM cell can be estimated as 10.61 µm 2 under a 0.14 µm CMOS process by scaling down. Thus, the size of the fabricated TCAM cell is reduced to 30 % compared to that of the conventional one. Moreover, minimum size of the proposed TCAM cell can be considered as 1/6 of the conventional one. b) More high-speed write operation is possible with increase of write current. For example, with the average current of 327 µa at 10 ns write. 23

Waveforms of equality-search operations P : Precharge phase E : Evaluate phase P E P E P E P E P E P E CLK Stored data B=1 Stored data B=0 Stored data B=X S Search data S=0 S=1 S=0 S=1 S=0 S=1 OUT result 780mV 10µs Bit-level equality-search is successfully demonstrated. 24

Waveforms of sleep/wake-up operations V DD Active Power-off Active Power-off Active P E Standby P E P E Standby P E CLK Stored data B=0 Stored data B=0 S OUT 780mV 10µs S=0 S=0 S=1 S=1 OUT before =1 OUT after =1 OUT before =0 OUT after =0 Instant sleep/wake-up behavior is successfully demonstrated. 25

Outline Nonvolatile Logic-in-Memory Architecture Overview NV GP-Logic: Nonvolatile-FPGA NV SP-Logic: Nonvolatile-TCAM Conclusions & Future Prospects 26

Conclusions Propose a MOS/MTJ-hybrid circuit (nonvolatile logic-inmemory circuit using MTJ devices) style Two kinds of typical applications with logic-in-memory architecture; NV-LUT circuit and NV- TCAM Compact and no static power dissipation Confirm basic behavior with fabricated test chips under an MTJ/CMOS process. It could open an ultra-low-power logic-circuit paradigm Future Prospects and Issues: 1. Establish the fabrication line 2. Establish the CAD tools 3. Explore the appropriate application fields (Impact towards Reliability Enhancement ) 27