2011 Spintronics Workshop on LSI @ Kyoto, Japan, June 13, 2011 MTJ-Based Nonvolatile Logic-in-Memory Architecture Takahiro Hanyu Center for Spintronics Integrated Systems, Tohoku University, JAPAN Laboratory for Brainware Systems, Research Institute of Electrical Communication, Tohoku University, JAPAN Acknowledgement: This work was supported by the project Research and Development of Ultra-Low Power Spintronics-Based VLSIs under the FIRST program of JSPS.
Outline Nonvolatile Logic-in-Memory Architecture Overview NV GP-Logic: Nonvolatile-FPGA NV SP-Logic: Nonvolatile-TCAM Conclusions & Future Prospects 2
Background: Increasing delay & power Leakage current Logic and Memory modules are separated Many interconnections between modules On-chip memory modules are volatile. Wire delay dominates chip performance Global wires requires large drivers. Power supply must be continuously applied in memory modules. Delay: Long Power: Large Static power: Large 3
Nonvolatile logic-in-memory architecture Logic-in-Memory Architecture (proposed in 1969): Storage elements are distributed over a logic-circuit plane. Magnetic Tunnel Junction (MTJ) device MTJ layer CMOS layer No volatility Unlimited endurance Fast writability Scalability CMOS compatibility 3-D stack capability Storage is nonvolatile: (Leakage current is cut off) MTJ devices are put on the CMOS layer Storage/logic are merged: (global-wire count is reduced) Static power is cut off. Chip area is reduced. Wire delay is reduced. Dynamic power is reduced. 4
Implementation of MTJ Device MTJ Device Fabricated CMOS back-end process M3 TH M2 TH M1 M1 LE TH M3 TH M2 TH UE M1 MTJ Layer Metal Layers MTJ MTJ C Gate C Gate C CMOS Layer 300nm Substrate Diffusion MTJ device stacked over MOS Plane D. Suzuki, et al., VLSI Circuit Symp. 2009 Cross-sectional SEM image The. area cost using MTJ device is small. 5
Power-Gating Suitability Power switch External Nonvolatile storage Power switch V DD Volatile storage Leakage current Logic Power switch Power Active Standby Active Escape to NVM Reload from NVM Time Power switch (PMOS) V DD Power Active Standby Active Nonvolatile storage (MTJ device) Logic Time NV logic-in-memory architecture Power gating is performed without data backup/reload. 6
Nonvolatile Processor Architecture Flash DRAM GP-Logic: General-purpose logic SP-Logic: Special-purpose logic Spin RAM NV NV NV SRAM FF GP-Logic FF SP-Logic Si 1st -step Nonvolatile Processor Spin RAM NV NVLIM NVLIM SRAM GP-Logic SP-Logic Si 2nd-step Nonvolatile Processor Nonvolatile Field-Programmable Gate Array (FPGA) Nonvolatile Ternary Content- Addressable Memory (TCAM) 7
Outline Nonvolatile Logic-in-Memory Architecture Overview NV GP-Logic: Nonvolatile-FPGA NV SP-Logic: Nonvolatile-TCAM Conclusions & Future Prospects 8
Nonvolatile Field-Programmable Gate Array (FPGA) NV LUT (Lookup Table) NV device -- Arbitrary logic functions are performed and programmed by FPGA -- Power dissipation and hardware overhead are two major issues. -- NV storage elements are distributed over the NV-FPGA (no external NVM). NVM Not required! NV FPGA Leakage current elimination and short latency are possible. How to design? Nonvolatile logic-inmemory architecture 9
Conventional nonvolatile FPGA CMOS logic circuit requires high-voltage input swing. MTJ MTJ MTJ MTJ Low voltage Combinational logic (CMOS) (: Sense Amplifier) High Voltage Output How do we perform logic operation by using low swing signal from MTJ device directly? 10
MOS/MTJ-hybrid circuitry (Proposed) Current-mode logic (CML) Logic operation is performed even low swing voltage by using the small difference of the current value. MTJ MTJ MTJ MTJ Combinational logic (Current-Mode) Output Low voltage High voltage Device count is reduced to 28% with less performance degradation. 11
Operation example (XOR) Z = 0 Sense Amplifier Z = 1 I F I F > I REF I REF Truth table 0 1 R AP 1 0 1 R P R P R AP 0 1 0 0 1 R REF A B Z 0 0 1 0 1 0 1 0 0 1 1 1 Logic operation in low swing voltage is performed by using a MOS/MTJ-hybrid network. 12
Test chip features Fabricated 2-input LUT D. Suzuki, et al., VLSI Circuit Symposium, June 2009. Selection Transistor Tree Process 0.14µm MTJ/MOS 1-Poly, 3-Metal 4 MTJ devices are stacked over MOS layer Write Area 287µm 2 MTJ Size 50nm 150nm TMR Ratio 100% Current 150µA Time Standby Current 10ns 0A 13
Measured waveforms (Basic operations) Input A Input B P E P E P E P E P: Pre-Charge E: Evaluate Output Z Output Z A 0.78V/div B Z 100µs/div Z 1 0 0 0 1 1 1 0 NOR NAND 0 1 1 0 1 0 0 1 XOR XNOR 14
Immediate wakeup behavior Active Standby Active CLK V DD Z A B V DD = 0 A B 00 01 10 11 00 01 10 11 Z 0.78V/div 50µs/div Immediate wakeup behavior has also measured successfully. 15
Comparison of performances Device Counts Area *2) Nonvolatile SRAM *1) 102 MOSs + 8 MTJs 702 µm 2 Proposed 29 MOSs + 4 MTJs 287µm 2 Active Delay *3) 140 ps 185 ps Power *3) 26.7 µw 17.5 µw Standby Power 0 µw 0 µw *1) W. Zhao, et al., Physica Status SOLIDI a Application and Materials Science, 205, 6, 1373/1377, May 2008. *2) Estimation based on a 0.14µm process *3) HSPICE simulation based on a 0.14µm MOS/MTJ-hybrid process 16
Outline Nonvolatile Logic-in-Memory Architecture Overview NV GP-Logic: Nonvolatile-FPGA NV SP-Logic: Nonvolatile-TCAM Conclusions & Future Prospects 17
Ternary Content-Addressable Memory (TCAM) Bit-line driver Input key 2 2 2 BL 1 BL 1 BL 2 BL 2 BL n BL n 0 1 0 0 1 1 0 Search-line / Word-line driver 2 2 2 2 2 2 2 1 1 0 0 0 1 X 0 1 0 0 1 X X Stored words 1 0 1 0 1 X X OUT 1 OUT 2 OUT n Fully parallel masked equality search Output driver 0 () 1 () 0 () Fully parallel search and fully parallel comparison can be done. TCAM is a functional memory. TCAM is the powerful data-search engine useful for various applications such as database machine and virus checker in network router TCAM must be implemented more compactly with lower power dissipation. 18
NV-TCAM Cell Function Search Stored data input Current result comparison B (b 1,b 2 ) S ML 0 (0,1) 1 (1,0) X don t care (0,0) 0 I Z < I Z 1 I Z > I Z 0 I Z > I Z 1 I Z < I Z 0 I Z < I Z 1 I Z < I Z 1 () 0 () 0 () 1 () 1 () 1 () 19
CMOS-based TCAM cell circuit ML V DD 1-bit storage Equality-detection (ED) circuit 1-bit storage WL V SS Leakage current Leakage current BL 1 SL SL BL 2 Transistor counts : 12 (ED;4T, 2-bit storage;8t) Input/output wires : 8 (BL;2, WL;1, V DD &V SS ;2, SL;2, ML;1) Always supply the power : Many leakage current path How to realize compact & cut off the leakage current? 20
MOS/MTJ-hybrid TCAM cell circuit S. Matsunaga, et al. Applied Physics Express (APEX), 2, 2, 023004, Feb. 2009. ML/BL 2-bit storage (MTJs) Logic (MTJs & MOSs) SL /WL 1 SL/WL 1 Merge storage into logic circuit : Compact (2T-2MTJ) Share wires : 4 (ML/BL, SL/WL, No-V DD ) 3-D stack structure : Great reduction of circuit area Compact & nonvolatile TCAM cell with MTJ devices 21
Power-Gating Scheme of Bit-Serial NV-TCAM 1st-bit search 1 1 1 Search word 2nd-bit search 1 1 1 Search word S. Matsunaga, et al., JJAP 49 (2010) 04DM05. 3rd-bit search 1 1 1 Search word 0 0 X 0 0 X 0 0 X 0 1 0 0 1 0 0 1 0 0 X 1 0 X 1 0 X 1 1 0 X 1 0 X 1 0 X 1 1 0 1 1 0 1 1 0 1 X 1 1 X 1 1 X 1 X 0 X X 0 X X 0 X X 1 0 X 1 0 X 1 0 X X 1 X X 1 X X 1 TCAM cell in active mode TCAM cell in standby mode (Static power is suppressed.) Accumulator in active mode Sense amplifier in active mode Sense amplifier in standby mode (Static power is suppressed.) According to the word length of the TCAM, the effectiveness of the standby-power reduction is increased. 22
TCAM cell circuit test chip 3.0 µm Chip features 9.8 µm Output generator in ML TCAM cell Ref. cell Dynamic current comparator in ML Process 0.14µm CMOS/MTJ 1-Poly, 3-Metal Total area 29.4 µm 2 TCAM cell size Cell structure MTJ size 3.15 µm 2 (2.1 µm 1.5 µm) a) 2MOSs-2MTJs 50 nm 200 nm TMR ratio 167 % Average write current Standby current 274 µa (τ p = 10 µs) b) 0A (Power off) a) A CMOS-based TCAM cell with 12 transistors, whose cell size is 17.54 µm 2 under a 0.18 µm CMOS process, has been reported. 8) The size of the conventional TCAM cell can be estimated as 10.61 µm 2 under a 0.14 µm CMOS process by scaling down. Thus, the size of the fabricated TCAM cell is reduced to 30 % compared to that of the conventional one. Moreover, minimum size of the proposed TCAM cell can be considered as 1/6 of the conventional one. b) More high-speed write operation is possible with increase of write current. For example, with the average current of 327 µa at 10 ns write. 23
Waveforms of equality-search operations P : Precharge phase E : Evaluate phase P E P E P E P E P E P E CLK Stored data B=1 Stored data B=0 Stored data B=X S Search data S=0 S=1 S=0 S=1 S=0 S=1 OUT result 780mV 10µs Bit-level equality-search is successfully demonstrated. 24
Waveforms of sleep/wake-up operations V DD Active Power-off Active Power-off Active P E Standby P E P E Standby P E CLK Stored data B=0 Stored data B=0 S OUT 780mV 10µs S=0 S=0 S=1 S=1 OUT before =1 OUT after =1 OUT before =0 OUT after =0 Instant sleep/wake-up behavior is successfully demonstrated. 25
Outline Nonvolatile Logic-in-Memory Architecture Overview NV GP-Logic: Nonvolatile-FPGA NV SP-Logic: Nonvolatile-TCAM Conclusions & Future Prospects 26
Conclusions Propose a MOS/MTJ-hybrid circuit (nonvolatile logic-inmemory circuit using MTJ devices) style Two kinds of typical applications with logic-in-memory architecture; NV-LUT circuit and NV- TCAM Compact and no static power dissipation Confirm basic behavior with fabricated test chips under an MTJ/CMOS process. It could open an ultra-low-power logic-circuit paradigm Future Prospects and Issues: 1. Establish the fabrication line 2. Establish the CAD tools 3. Explore the appropriate application fields (Impact towards Reliability Enhancement ) 27