Reconfigurable Spintronic Fabric using Domain Wall Devices

Similar documents
Analysis of ALU Designs Aim for Improvement in Processor Efficiency and Capability from

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

A Brief Compendium of On Chip Memory Highlighting the Tradeoffs Implementing SRAM,

ReSpace/MAPLD Conference Albuquerque, NM, August A Fault-Handling Methodology by Promoting Hardware Configurations via PageRank

Cache Memory Configurations and Their Respective Energy Consumption

MTJ-Based Nonvolatile Logic-in-Memory Architecture

Cascaded Channel Model, Analysis, and Hybrid Decoding for Spin-Torque Transfer Magnetic Random Access Memory (STT-MRAM)

Adaptive Resilience Approaches for FPGA Fabrics

Revolutionizing Technological Devices such as STT- RAM and their Multiple Implementation in the Cache Level Hierarchy

[Sahu* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Lecture 1: Introduction

Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014

Area-Efficient Fault-Handling for Survivable Signal-Processing Architectures

Hybrid STT CMOS Designs for Reverse engineering Prevention

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT

International Journal of Modern Trends in Engineering and Research. Synthesis and Implementation of PLC on FPGA

Analysis of Cache Configurations and Cache Hierarchies Incorporating Various Device Technologies over the Years

Emerging NVM Enabled Storage Architecture:

Loadsa 1 : A Yield-Driven Top-Down Design Method for STT-RAM Array

Area, Power, and Latency Considerations of STT-MRAM to Substitute for Main Memory

Feedback Techniques for Dual-rail Self-timed Circuits

Implementation of a FIR Filter on a Partial Reconfigurable Platform

Reconfigurable PLL for Digital System

The Engine. SRAM & DRAM Endurance and Speed with STT MRAM. Les Crudele / Andrew J. Walker PhD. Santa Clara, CA August

Department of Electrical and Computer Engineering, University of Rochester, Computer Studies Building,

Neurmorphic Architectures. Kenneth Rice and Tarek Taha Clemson University

FeRAM Circuit Technology for System on a Chip

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

An Architecture-level Cache Simulation Framework Supporting Advanced PMA STT-MRAM

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN

An Autonomic Architecture for Organically Reconfigurable Computing Systems

CMP annual meeting, January 23 rd, 2014

CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL

technology Leadership

Leso Martin, Musil Tomáš

A Survey of Imprecise Signal Processing

3D systems-on-chip. A clever partitioning of circuits to improve area, cost, power and performance. The 3D technology landscape

Computing with Spintronics: Circuits and architectures

Computing-in-Memory with Spintronics

A REVIEW ON INTEGRATION OF SPIN RAM IN FPGA CIRCUITS

A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management

Efficient Self-Reconfigurable Implementations Using On-Chip Memory

Design-For-Diversity for Improved Fault-Tolerance of TMR Systems on FPGAs

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

Proposers Day Workshop

Programming Characteristics on Three-Dimensional NAND Flash Structure Using Edge Fringing Field Effect

Proposers Day Workshop

International Journal of Information Research and Review Vol. 05, Issue, 02, pp , February, 2018

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores

Figure 1. An 8-bit Superset Adder.

EMERGING NON VOLATILE MEMORY

Design and Implementation of Low Power LUT Based on Nonvolatile RRAM

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Practical Information

A Single Poly Flash Memory Intellectual Property for Low-Cost, Low-Density Embedded Nonvolatile Memory Applications

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

The Effect of Temperature on Amdahl Law in 3D Multicore Era

By Charvi Dhoot*, Vincent J. Mooney &,

Copyright 2012, Elsevier Inc. All rights reserved.

Investigation and Comparison of Thermal Distribution in Synchronous and Asynchronous 3D ICs Abstract -This paper presents an analysis and comparison

Survey on Stability of Low Power SRAM Bit Cells

MRAM, XPoint, ReRAM PM Fuel to Propel Tomorrow s Computing Advances

ECE 486/586. Computer Architecture. Lecture # 2

In-memory computing with emerging memory devices

Unleashing MRAM as Persistent Memory

Couture: Tailoring STT-MRAM for Persistent Main Memory. Mustafa M Shihab Jie Zhang Shuwen Gao Joseph Callenes-Sloan Myoungsoo Jung

Daniele Ielmini DEI - Politecnico di Milano, Milano, Italy Outline. Solid-state disk (SSD) Storage class memory (SCM)

ADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING

A Self-Configuring TMR Scheme utilizing Discrepancy Resolution

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Don t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration

This material is based upon work supported in part by Intel Corporation /DATE13/ c 2013 EDAA

Embedded Systems. Octav Chipara. Thursday, September 13, 12

Mitigating Process Variability for Non-Volatile Cache Resilience and Yield

ECE520 VLSI Design. Lecture 1: Introduction to VLSI Technology. Payman Zarkesh-Ha

SF-LRU Cache Replacement Algorithm

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University

Dynamic Partial Reconfigurable FIR Filter Design

L3/L4 Multiple Level Cache concept using ADS

EECS4201 Computer Architecture

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

M.TECH VLSI IEEE TITLES

Hardware Software Codesign of Embedded Systems

DESIGN AND PERFORMANCE ANALYSIS OF A NONVOLATILE MEMORY CELL

FPGA Based Intelligent Co-operative Processor in Memory Architecture

Resolving Load Balancing Issue of Grid Computing through Dynamic Approach

Microelettronica. J. M. Rabaey, "Digital integrated circuits: a design perspective" EE141 Microelettronica

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras

An Overload-Free Data-Driven Ultra-Low-Power Networking Platform Architecture

Designing digital circuits for FPGAs using parallel genetic algorithms (WIP)

Computation-oriented Fault-tolerance Schemes for RRAM-based Computing Systems

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

Built-in Self-Test and Repair (BISTR) Techniques for Embedded RAMs

3D Hetero-Integration Technology for Future Automotive Smart Vehicle System

A Proposal for a High Speed Multicast Switch Fabric Design

Designing for Low Power with Programmable System Solutions Dr. Yankin Tanurhan, Vice President, System Solutions and Advanced Applications

Reliable Physical Unclonable Function based on Asynchronous Circuits

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007

Transcription:

Reconfigurable Spintronic Fabric using Domain Wall Devices Ronald F. DeMara, Ramtin Zand, Arman Roohi, Soheil Salehi, and Steven Pyle Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 32816-2362 December 20, 2014 We introduce a novel spintronic device and architecture to realize a reconfigurable fabric for super-high-performance computing at ultra-low power while providing greater resiliency that reconfiguration allows. Figure 1 shows characteristics of spintronic-based technologies and architectures along with their advantages and challenges. Spintronic devices such as Magnetic Tunnel Junction (MTJ) and Domain Wall Magnets (DWM) are proven for memory applications and we research their potential in non Von Neumann computation for improved energy and throughput [1].

Figure 1: Taxonomy of Nanocomputing Architectures highlighting advantages of proposed LIM approach. Introduction While spintronic-based neuromorphic architectures offer analog computation strategies [2], in this proposal we exploit reconfigurability and associative processing using a Logic-In-Memory (LIM) paradigm. LIM is compatible with conventional computing algorithms and integrates logical operations with data storage, making it an ideal choice for parallel SIMD operations to eliminate frequent accesses to memory, which are extreme contributors to energy consumption. Spin-based LIM architectures have the capability to increase computational throughput, reduce the die area, provide instant-on functionality, and reduce static power consumption [3]. Feasibility of a low power spintronic LIM chip has recently been demonstrated in [4] for database applications. As shown in Figure 2, in order to facilitate a variety of highly data parallel Air Force applications such as Image Processing, Weather Forecasting, Big Data Analysis, and Physics Simulations, we propose a novel reconfigurable fabric succeeding FPGAs to allow unprecedented gains in nanocomputation. Specifically, we will research 1) energy-efficient associative computing paradigms and 2) DW-based LIM reconfigurable fabric. 1

Figure 2: Non-Conventional Ultra Low Power Computing Architectures. DWM logic devices initially proposed in [5] have the potential to alleviate power consumption issues. Specifically, in [6] the analytical expressions for wall energy density (ε W ) is sub-linearly related by ε W = 2π AK and wall width (δ W ) is expressed by δ W = π AK, where A is the exchange constant, and K is the magnetic anisotropy constant. Domain Wall (DW) Racetrack Memory has been fabricated by IBM in 2011 [7]. Our team utilized DW racetrack memory to implement a power efficient GPGPU register file [8]. The results show that energy efficiency is significantly improved as shown in Figure 3. Although DW devices could provide the high speed switching necessary for LIM architecture, reliability issues still remain a major concern for DW logic. In order to enhance reliability and exploit associative processing, a novel design of the conventional racetrack array, called Domain Wall Nanomagnet-based Ladders (DWNL) is proposed. Figure 3: Parameters of DW Racetrack Memory for GPGPU register file [8]. 2

Reconfigurable Spintronic Fabric (RSF) Unlike fixed pre-determined computing architectures which have recently been researched, a more effective approach is to realize the entire spectrum of applications by designing a Reconfigurable Spintronic Fabric (RSF). As shown in Figure 5, the RSF is a 2D array of Configurable Logic In Memory Blocks (CLIMBs) comprised of an array of DWNL cells. The use of reconfiguration to address challenges of AFRL-related applications with low energy budgets while maintaining availability and resilience have been developed by our team in recent years [9-14]. Figure 4: (a) Domain Wall Nanomagnet-based Ladder (DWNL), (b) Reflexive Referencing Cell Operation Cycle 1, (c) Reflexive Referencing Cell Operation Cycle 2. Conclusion DWNL will be utilized in CLIMB arrays to store bits as spin magnetization direction of different domains separated by domain walls, which can be shifted along a magnetic nanowire with the last domain reserved for sensing. This novel Reflexive Referencing Cell consists of a 3

reference MTJ that has a common fixed and oxide layer with the last domain. Such a design has the potential to reduce the effect of cell-to-cell variation. Figure 4(a) delineates our proposed 2- cycle self-reflexive variation-tolerant reading scheme. Cycle 1 and Cycle 2 sense the reference and output respectively as shown in Figure 4(b) and 4(c). If the voltage from the second cycle is greater than the voltage from the first, the value is 1, and vice versa. Moreover, DWNL is intrinsically compatible with the associative computing instructions such as: shift, compare, and write. Figure 5: System Hierarchy of Nanocomputing Architecture: RSF, CLIMB, Ladder. Figure 5 shows the proposed computing architecture which provides the appropriate platform for ultra-low power data-intensive processing applications. The core populates the RSF DWNL cells with application data as well as writes the CLIMBs instruction memory with appropriate associate computing programs to perform the desired application. Only the final output data needs to be transmitted to the core. 4

References [1] Kim, Jongyeon, et al. "Spin-Based Computing: Device Concepts, Current Status, and a Case Study on a High-Performance Microprocessor." Proceedings of the IEEE 103.1, 2015. [2] Sharad, Mrigank, et al. "Energy-Efficient Non-Boolean Computing With Spin Neurons and Resistive Memory." IEEE Transactions on Nanotechnology, pp. 23-34, 2014. [3] Zhang, Yue, et al. "Spintronics for low-power computing." Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014. [4] Jarollahi, Onizawa, et al. "A Nonvolatile Associative Memory-Based Context-Driven Search Engine Using 90 nm CMOS/MTJ-Hybrid Logic-in-Memory Architecture." IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 4, no. 4, pp. 460-474, 2014. [5] Allwood, Dan A., et al. "Magnetic domain-wall logic." Science, pp. 1688-1692, 2005. [6] Tauxe, Lisa. Essentials of Paleomagnetism. Univ. of California Press, 2010. [7] Annunziata, A. J., et al. "Racetrack memory cell array with integrated magnetic tunnel junction readout." IEEE International Electronics Devices Meeting (IEDM), 2011. IEEE, 2011. [8] Mao, Mengjie, et al. "Exploration of GPGPU register file architecture using domain-wall-shiftwrite based racetrack memory." Design Automation Conference (DAC), IEEE, 2014. [9] N. Imran, R. F. DeMara, J. Lee, and J. Huang, "Self-adapting Resource Escalation for Resilient Signal Processing Architectures." Journal of Signal Processing Systems, 2013. [10] R. Al-Haddad, R. Oreifej, R. A. Ashraf, and R. F. DeMara, "Sustainable Modular Adaptive Redundancy Technique Emphasizing Partial Reconfiguration for Reduced Power Consumption." International Journal of Reconfigurable Computing, 25 pages, 2011. [11] M. Alawad, Y. Bai, R. F. DeMara, and M. Lin, Energy-Efficient Multiplier-Less Discrete Convolver through Probabilistic Domain Transformation. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 185-188, 2014. [12] N. Imran and R. F. DeMara, Heterogeneous Concurrent Error Detection (hced) Based On Output Anticipation, in Proceedings of 2011 International Conference on Reconfigurable Computing and FPGAs, Cancun, Mexico, November 30, 2011 December 2, 2011, pp. 61 66. [13] N. Imran, J. Lee, Y. Kim, M. Lin, and R. F. DeMara, Fault-Mitigation by Adaptive Dynamic Reconfiguration for Survivable Signal-Processing Architectures, International Journal of Control and Automation, Volume 6, Number 2, Pages 111 120, April 2013. [14] R. F. DeMara, K. Zhang, and C. A. Sharma Autonomic Fault-Handling and Refurbishment Using Throughput-Driven Assessment, Applied Soft Computing, Volume 11, Issue 2, March 2011, pp. 1588 1599. 5